Results 1 - 10
of
10
Skiplist-based Concurrent Priority Queues
, 2000
"... This paper addresses the problem of designing scalable concurrent priority queues for large scale multiprocessors – machines with up to several hundred processors. Priority queues are fundamental in the design of modern multiprocessor algorithms, with many classical applications ranging from numeric ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
This paper addresses the problem of designing scalable concurrent priority queues for large scale multiprocessors – machines with up to several hundred processors. Priority queues are fundamental in the design of modern multiprocessor algorithms, with many classical applications ranging from numerical algorithms through discrete event simulation and expert systems. While highly scalable approaches have been introduced for the special case of queues with a fixed set of priorities, the most efficient designs for the general case are based on the parallelization of the heap data structure. Though numerous intricate heap-based schemes have been suggested in the literature, their scalability seems to be limited to small machines in the range of ten to twenty processors. This paper proposes an alternative approach: to base the design of concurrent priority queues on the probabilistic skiplist data structure, rather than on a heap. To this end, we show that a concurrent skiplist structure, following a simple set of modifications, provides a concurrent priority queue with a higher level of parallelism and significantly less contention than the fastest known heap-based algorithms. Our initial empirical evidence, collected on a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife, suggests that the new skiplist based priority queue algorithm scales significantly better than heap based schemes throughout most of the concurrency range. With 256 processors, they are about 3 times faster in performing deletions and up to 10 times faster in performing insertions.
Implementing Parallel Algorithms based on Prototype Evaluation and Transformation
- Department of Computer Science, University of Dortmund
, 1997
"... Combining parallel programming with prototyping is aimed at alleviating parallel programming by enabling the programmer to make practical experiments with ideas for parallel algorithms at a high level, neglecting low-level considerations of specific parallel architectures in the beginning of prog ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Combining parallel programming with prototyping is aimed at alleviating parallel programming by enabling the programmer to make practical experiments with ideas for parallel algorithms at a high level, neglecting low-level considerations of specific parallel architectures in the beginning of program development. Therefore, prototyping parallel algorithms is aimed at bridging the gap between conceptual design of parallel algorithms and practical implementation on specific parallel systems. The essential prototyping activities are programming, evaluation and transformation of prototypes. This paper gives a report on some experience with implementing parallel algorithms based on prototype evaluation and transformation employing the ProSet-Linda approach. 1 Introduction Parallel programming is conceptually harder to undertake and to understand than sequential programming, because a programmer often has to cope with the coexistence and coordination of multiple parallel activities....
Network Simulation on Cray-T3E using MPI
- 3 rd Cray-SGI MPP conf
, 1997
"... We propose a novel approach for parallel discrete-event network simulation on packet-switched, point-to-point networks. Our algorithm resolves packet conflicts through priority sorting of appropriate integer conflict functions. We implement our method on CM-5, Cray-T3D, and Cray-T3E systems using C ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We propose a novel approach for parallel discrete-event network simulation on packet-switched, point-to-point networks. Our algorithm resolves packet conflicts through priority sorting of appropriate integer conflict functions. We implement our method on CM-5, Cray-T3D, and Cray-T3E systems using C and MPI, and perform critical optimizations aimed at reducing sorting overhead, minimizing inter-processor communication, and optimizing scalar processing. Performance results for a packet-switched hypercube topology indicate that our parallel simulation approach achieves good scalability and efficiency; our optimized simulator can process ¸500K packet moves in 1sec, with an efficiency that exceeds ¸ 50% for a few thousands packets on the Cray-T3E with 32 PEs. 1 Introduction Simulation is a general approach for performance evaluation and testing of complex systems. Simulation speed depends on the system complexity and the efficiency of simulation algorithms. In discrete time-driven simulati...
Clock Level Simulations Of An Atm Switch
, 1996
"... We simulate an ATM VPI-switch with pipelined (or shared-slot data buffer) and output buffering. Our experiments for 2 \Theta 2 and 4 \Theta 4 switches under constant and time-variant traffic indicate that both switches and memory organizations have near-optimal throughput. Assuming switches are of s ..."
Abstract
- Add to MetaCart
We simulate an ATM VPI-switch with pipelined (or shared-slot data buffer) and output buffering. Our experiments for 2 \Theta 2 and 4 \Theta 4 switches under constant and time-variant traffic indicate that both switches and memory organizations have near-optimal throughput. Assuming switches are of similar technology, multistage network experiments under constant traffic indicate that 4 \Theta 4 and 16 \Theta 16 networks configured with smaller 2\Theta2 switches achieve higher performance. However, networks configured with larger switches are more cost-efficient. 1 INTRODUCTION ATM is an embrace of developments in circuit and packet switching, allowing for multiple logical connections to be multiplexed over a single physical link (Vetter and Du 1995). These connections are identified by unique virtual path (VPI=12 bits) and virtual circuit (VCI=16 bits) identifiers. The packet size is fixed to 53 bytes, with 48 bytes data and 5 bytes header information. Previous link-to-link header an...
Parallel Priority Queues on Cray-T3E
"... We examine the design, implementation, and experimental analysis of parallel priority queues for network simulation. We consider: a) distributed splay trees using MPI, b) concurrent heaps using shared memory atomic locks, and c) a new, more general concurrent data structure based on distributed sort ..."
Abstract
- Add to MetaCart
(Show Context)
We examine the design, implementation, and experimental analysis of parallel priority queues for network simulation. We consider: a) distributed splay trees using MPI, b) concurrent heaps using shared memory atomic locks, and c) a new, more general concurrent data structure based on distributed sorted lists, which is designed to provide dynamically balanced work allocation (with automatic or manual control) and efficient use of shared memory resources. We evaluate performance for all three data structures on a Cray-T3E900 system at KFA-Julich. Our comparisons are based on simulations of single buffers, and a 64 \Theta 64 packet switch which supports multicasting. In all implementations, PEs monitor traffic at their preassigned input/output ports, while priority queue elements are distributed across the Cray-T3E virtual shared memory. Our experiments with up to 60,000 packets and 2 to 64 PEs indicate that concurrent priority queues are 5-10 times faster and more scalable than distribute...
Priority Queues and Sorting Methods for Parallel Simulation
, 2000
"... We examine the design, implementation, and experimental analysis of parallel priority queues for device and network simulation. We consider: a) distributed splay trees using MPI, b) concurrent heaps using shared memory atomic locks, and c) a new, more general concurrent data structure based on di ..."
Abstract
- Add to MetaCart
(Show Context)
We examine the design, implementation, and experimental analysis of parallel priority queues for device and network simulation. We consider: a) distributed splay trees using MPI, b) concurrent heaps using shared memory atomic locks, and c) a new, more general concurrent data structure based on distributed sorted lists, which is designed to provide dynamically balanced work allocation (with automatic or manual control) and efficient use of shared memory resources. We evaluate performance for all three data structures on a Cray-T3E900 system at KFA-Julich. Our comparisons are based on simulations of single buffers and a 64 \Theta 64 packet switch which supports multicasting. In all implementations, PEs monitor traffic at their preassigned input/output ports, while priority queue elements are distributed across the Cray-T3E virtual shared memory. Our experiments with up to 60,000 packets and 2 to 64 PEs indicate that concurrent priority queues perform much better than distributed ones. Both concurrent implementations have comparable performance, while our new data structure uses less memory and has been further optimized. We also consider parallel simulation for symmetric networks by sorting integer conflict functions and implementing an interesting packet indexing scheme.
MSL Based Concurrent and Efficient Priority Queue
"... Abstract — Priority queues are fundamental in the design of modern multiprocessor algorithms. Priority queues with parallel access are an attractive data structure for applications like prioritized online scheduling, discrete event simulation, or branch-and-bound. This paper proposes an alternative ..."
Abstract
- Add to MetaCart
Abstract — Priority queues are fundamental in the design of modern multiprocessor algorithms. Priority queues with parallel access are an attractive data structure for applications like prioritized online scheduling, discrete event simulation, or branch-and-bound. This paper proposes an alternative approach: to base the design of concurrent priority queues on the Modified Skip List data structure. To this end, we show that a concurrent modified Skip List structure, following a simple set of modifications, provides a concurrent priority queue with a higher level of parallelism. Many algorithms for concurrent priority queues are based on mutual exclusion. However, mutual exclusion causes blocking which has several drawbacks and degrades the system’s overall performance. Non-blocking algorithms avoid blocking, and are either lock-free or wait-free. Previously known non-blocking algorithms of priority queues did not perform well in practice because of their complexity, and they are often based on non-available atomic synchronization primitives.