Results 1  10
of
52
Fast and Scalable Priority Queue Architecture for HighSpeed Network Switches
, 2000
"... In this paper, we present a fast and scalable pipelined priority queue architecture for use in highperformance switches with support for finegrained quality of service (QoS) guarantees. Priority queues are used to implement highestpriorityfirst scheduling policies. Our hardware architecture i ..."
Abstract

Cited by 40 (9 self)
 Add to MetaCart
In this paper, we present a fast and scalable pipelined priority queue architecture for use in highperformance switches with support for finegrained quality of service (QoS) guarantees. Priority queues are used to implement highestpriorityfirst scheduling policies. Our hardware architecture is based on a new data structure called a Pipelined heap, or Pheap for short. This data structure enables the pipelining of the enqueue and dequeue operations, thereby allowing these operations to execute in essentially constant time. In addition to being very fast, the architecture also scales very well to a large number of priority levels and to large queue sizes. We give a detailed description of this new data structure, the associated algorithms and the corresponding hardware implementation. We have implemented this new architecture using a 0.35 micron CMOS technology.
An Efficient Algorithm for Concurrent Priority Queue Heaps
, 1994
"... We present a new algorithm for concurrent access to arraybased priority queue heaps. Deletions proceed topdown as they do in a previous algorithm due to Rao and Kumar [6], but insertions proceed bottomup, and consecutive insertions use a bitreversal technique to scatter accesses across the fring ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
We present a new algorithm for concurrent access to arraybased priority queue heaps. Deletions proceed topdown as they do in a previous algorithm due to Rao and Kumar [6], but insertions proceed bottomup, and consecutive insertions use a bitreversal technique to scatter accesses across the fringe of the tree, to reduce contention. Because insertions do not have to traverse the entire height of the tree (as they do in previous work), as many as O(M) operations can proceed in parallel, rather than O(logM) on a heap of size M. Experimental results on a Silicon Graphics Challenge multiprocessor demonstrate good overall performance for the new algorithm on small heaps, and significant performance improvements over known alternatives on large heaps with mixed insertion/deletion workloads.
Selection on the BulkSynchronous Parallel Model with Applications to Priority Queues
 IN PROCEEDINGS OF THE 1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS
, 1996
"... In this paper we present a new randomized selection algorithm on the BulkSynchronous Parallel (BSP) model of computation, along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our methods improve previous results upon both the c ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
In this paper we present a new randomized selection algorithm on the BulkSynchronous Parallel (BSP) model of computation, along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our methods improve previous results upon both the communication requirements and the amount of parallel slack required to achieve optimal performance. We also establish that optimality to within small multiplicative constant factors can be achieved for a wide range of parallel machines. While these algorithms are fairly simple themselves, descriptions of their performance in terms of the BSP parameters is somewhat involved. The main reward of quantifying these complications is that it allows transportable software to be written for parallel machines that fit the model. We also present experimental results for the selection algorithm that reinforce our claims. 1 Introduction and the BSP Model The main technical contribution of this work is ...
Fast Priority Queues for Parallel BranchandBound
 In Workshop on Algorithms for Irregularly Structured Problems, number 980 in LNCS
, 1995
"... . Currently used parallel best first branchandbound algorithms either suffer from contention at a centralized priority queue or can only approximate the best first strategy. Bottleneck free algorithms for parallel priority queues are known but they cannot be implemented very efficiently on contemp ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
. Currently used parallel best first branchandbound algorithms either suffer from contention at a centralized priority queue or can only approximate the best first strategy. Bottleneck free algorithms for parallel priority queues are known but they cannot be implemented very efficiently on contemporary machines. We present quite simple randomized algorithms for parallel priority queues on distributed memory machines. For branchandbound they are asymptotically as efficient as previously known PRAM algorithms with high probability. The simplest versions require not much more communication than the approximated branchandbound algorithm of Karp and Zhang. Keywords: Analysis of randomized algorithms, distributed memory, load balancing, median selection, parallel best first branchandbound, parallel pritority queue. 1 Introduction Branchandbound search is an important technique for many combinatorial optimization problems. Since it can be a quite time consuming technique, paralleli...
Parallelism and Locality in Priority Queues
 In Sixth IEEE Sypmposium on Parallel and Distributed Processing
, 1994
"... We explore two ways of incorporating parallelism into priority queues. The first is to speed up the execution of individual priority operations so that they can be performed one operation per time step, unlike sequential implementations which require O(log N ) time steps per operation for an N eleme ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
We explore two ways of incorporating parallelism into priority queues. The first is to speed up the execution of individual priority operations so that they can be performed one operation per time step, unlike sequential implementations which require O(log N ) time steps per operation for an N element heap. We give an optimal parallel implementation that uses a linear array of O(log N ) processors. Second, we consider parallel operations on the priority queue. We show that using a ddimensional array (constant d) of P processors we can insert or delete the smallest P elements from a heap in time O(P 1=d log 1\Gamma1=d P ), where the number of elements in the heap is assumed to be polynomial in P . We also show a matching lower bound, based on communication complexity arguments, for a range of deterministic implementations. Finally, using randomization, we show that the time can be reduced to the optimal O(P 1=d ) time with high probability. 1 Introduction Much of the theoret...
Lazy Queue: A new approach to implementing the Pendingevent Set
"... In discrete event simulation, very often the future event set is represented by a priority queue. The data structure used to implement the queue and the way operations are performed on it are often crucial to the execution time of a simulation. In this paper a new priority queue implementation strat ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
In discrete event simulation, very often the future event set is represented by a priority queue. The data structure used to implement the queue and the way operations are performed on it are often crucial to the execution time of a simulation. In this paper a new priority queue implementation strategy, the Lazy Queue, is presented. It is tailored to handle operations on the pending event set efficiently. The Lazy Queue is a kind of multilist data structure that delays the sorting process until a point near the time where the elements are to be dequeued. In this way, the time needed to sort new elements in the queue is reduced. We have performed several experiments comparing queue access times with the access times of the implicit heap and the calendar queue. Our experimental results indicate that the Lazy Queue is superior to these priority queue implementations. Key words: Discrete Event Simulation, Priority Queue, Event List implementation, performance measurement. 1 Introduction...
Randomized Priority Queues for Fast Parallel Access
 Journal of Parallel and Distributed Computing
, 1997
"... Applications like parallel search or discrete event simulation often assign priority or importance to pieces of work. An effective way to exploit this for parallelization is to use a priority queue data structure for scheduling the work; but a bottleneck free implementation of parallel priority ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Applications like parallel search or discrete event simulation often assign priority or importance to pieces of work. An effective way to exploit this for parallelization is to use a priority queue data structure for scheduling the work; but a bottleneck free implementation of parallel priority queue access by many processors is required to make this approach scalable. We present simple and portable randomized algorithms for parallel priority queues on distributed memory machines with fully distributed storage. Accessing O(n) out of m elements on an nprocessor network with diameter d requires amortized time O with high probability for many network types. On logarithmic diameter networks, the algorithms are as fast as the best previously known EREWPRAM methods. Implementations demonstrate that the approach is already useful for medium scale parallelism.
Parallel Priority Queues
, 1991
"... This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integervalued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an nprocessor CREWPRAM are based on two new ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integervalued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an nprocessor CREWPRAM are based on two new data structures, the nBandwidthHeap (nH) and the nBandwidth LeftistHeap (nL), that are obtained as extensions of the well known sequential binaryheap and leftistheap, respectively. Using these structures, it is shown that insertion of n new items in a PPQ of m elements can be performed in parallel time O(h + log n), where h = log m n , while deletion of the n smallest items can be performed in time O(h + log log n). Keywords Data structures, parallel algorithms, analysis of algorithms, heaps, PRAM model. This work has been partly supported by the Ministero della Pubblica Istruzione of Italy and by the C.N.R. project "Sistemi Informatici e Calcolo Parallelo" y Istituto di Ela...
Scalable Concurrent Priority Queue Algorithms
 In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
, 1999
"... This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms  from the application level to lowest levels of the operating sy ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms  from the application level to lowest levels of the operating system kernel. While most of the available priority queue literature is directed at existing smallscale machines, we chose to evaluate algorithms on a broader concurrency scale using a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife. Our empirical evidence suggests that the priority queue algorithms currently available in the literature do not scale. Based on these findings, we present two simple new algorithms, LinearFunnels and FunnelTree, that provide true scalability throughout the concurrency range. 1 Introduction Priority queues are a fundamental class of data structures used in the design of modern multiprocessor algorithms. Their uses r...
Skiplistbased Concurrent Priority Queues
, 2000
"... This paper addresses the problem of designing scalable concurrent priority queues for large scale multiprocessors – machines with up to several hundred processors. Priority queues are fundamental in the design of modern multiprocessor algorithms, with many classical applications ranging from numeric ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
This paper addresses the problem of designing scalable concurrent priority queues for large scale multiprocessors – machines with up to several hundred processors. Priority queues are fundamental in the design of modern multiprocessor algorithms, with many classical applications ranging from numerical algorithms through discrete event simulation and expert systems. While highly scalable approaches have been introduced for the special case of queues with a fixed set of priorities, the most efficient designs for the general case are based on the parallelization of the heap data structure. Though numerous intricate heapbased schemes have been suggested in the literature, their scalability seems to be limited to small machines in the range of ten to twenty processors. This paper proposes an alternative approach: to base the design of concurrent priority queues on the probabilistic skiplist data structure, rather than on a heap. To this end, we show that a concurrent skiplist structure, following a simple set of modifications, provides a concurrent priority queue with a higher level of parallelism and significantly less contention than the fastest known heapbased algorithms. Our initial empirical evidence, collected on a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife, suggests that the new skiplist based priority queue algorithm scales significantly better than heap based schemes throughout most of the concurrency range. With 256 processors, they are about 3 times faster in performing deletions and up to 10 times faster in performing insertions.