A Methodology for Implementing Highly Concurrent Data Objects
, 1993
"... A concurrent object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on critical sections: ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchro ..."
A concurrent object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on critical sections: ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchronous systems: if one process is halted or delayed in a critical section, other, nonfaulty processes will be unable to progress. By contrast, a concurrent object implementation is lock free if it always guarantees that some process will complete an operation in a finite number of steps, and it is wait free if it guarantees that each process will complete an operation in a finite number of steps. This paper proposes a new methodology for constructing lockfree and waitfree implementations of concurrent objects. The objectâ€™s representation and operations are written as stylized sequential programs, with no explicit synchronization. Each sequential operation is automatically transformed into a lockfree or waitfree operation using novel synchronization and memory management algorithms. These algorithms are presented for a multiple instruction/multiple data (MIMD) architecture in which n processes communicate by applying atomic read, wrzte, load_linked, and store_conditional operations to a shared memory.
Multimodel parallel programming in Psyche
 In Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1990
"... Many different parallel programming models, including lightweight processes that communicate with shared memory and heavyweight processes that communicate with messages, have been used to implement parallel applications. Unfortunately, operating systems and languages designed for parallel programmin ..."
Many different parallel programming models, including lightweight processes that communicate with shared memory and heavyweight processes that communicate with messages, have been used to implement parallel applications. Unfortunately, operating systems and languages designed for parallel programming typically support only one model. Multimodel parallel programming is the simultaneous use of several different models, both across programs and within a single program. This paper describes multimodel parallel programming in the Psyche multiprocessor operating system. We explain why multimodel programming is desirable and present an operating system interface designed to support it. Through a series of three examples, we illustrate how the Psyche operating system supports different models of parallelism and how the different models are able to interact. 1.
An Efficient Algorithm for Concurrent Priority Queue Heaps
 Inf. Proc. Letters
, 1996
"... We present a new algorithm for concurrent access to arraybased priority queue heaps. Deletions proceed topdown as they do in a previous algorithm due to Rao and Kumar [6], but insertions proceed bottomup, and consecutive insertions use a bitreversal technique to scatter accesses across the fring ..."
We present a new algorithm for concurrent access to arraybased priority queue heaps. Deletions proceed topdown as they do in a previous algorithm due to Rao and Kumar [6], but insertions proceed bottomup, and consecutive insertions use a bitreversal technique to scatter accesses across the fringe of the tree, to reduce contention. Because insertions do not have to traverse the entire height of the tree (as they do in previous work), as many as O(M) operations can proceed in parallel, rather than O(log M) on a heap of size M . Experimental results on a Silicon Graphics Challenge multiprocessor demonstrate good overall performance for the new algorithm on small heaps, and significant performance improvements over known alternatives on large heaps with mixed insertion/deletion workloads. This work was supported in part by NSF grants nos. CDA8822724 and CCR9319445, and by ONR research grant no. N0001492J1801 (in conjunction with the DARPA Research in Information Science and Tech...
Lazy Queue: A new approach to implementing the Pendingevent Set
"... In discrete event simulation, very often the future event set is represented by a priority queue. The data structure used to implement the queue and the way operations are performed on it are often crucial to the execution time of a simulation. In this paper a new priority queue implementation strat ..."
In discrete event simulation, very often the future event set is represented by a priority queue. The data structure used to implement the queue and the way operations are performed on it are often crucial to the execution time of a simulation. In this paper a new priority queue implementation strategy, the Lazy Queue, is presented. It is tailored to handle operations on the pending event set efficiently. The Lazy Queue is a kind of multilist data structure that delays the sorting process until a point near the time where the elements are to be dequeued. In this way, the time needed to sort new elements in the queue is reduced. We have performed several experiments comparing queue access times with the access times of the implicit heap and the calendar queue. Our experimental results indicate that the Lazy Queue is superior to these priority queue implementations. Key words: Discrete Event Simulation, Priority Queue, Event List implementation, performance measurement. 1 Introduction...
Parallel Priority Queues
, 1991
"... This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integervalued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an nprocessor CREWPRAM are based on two new ..."
This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integervalued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an nprocessor CREWPRAM are based on two new data structures, the nBandwidthHeap (nH) and the nBandwidth LeftistHeap (nL), that are obtained as extensions of the well known sequential binaryheap and leftistheap, respectively. Using these structures, it is shown that insertion of n new items in a PPQ of m elements can be performed in parallel time O(h + log n), where h = log m n , while deletion of the n smallest items can be performed in time O(h + log log n). Keywords Data structures, parallel algorithms, analysis of algorithms, heaps, PRAM model. This work has been partly supported by the Ministero della Pubblica Istruzione of Italy and by the C.N.R. project "Sistemi Informatici e Calcolo Parallelo" y Istituto di Ela...
Concurrent Heaps on the BSP Model
, 1996
"... In this paper we present a new randomized selection algorithm on the BulkSynchronous Parallel (BSP) model of computation along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our algorithms improve previous results upon both the ..."
In this paper we present a new randomized selection algorithm on the BulkSynchronous Parallel (BSP) model of computation along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our algorithms improve previous results upon both the communication requirements and the amount of parallel slack required to achieve optimal performance. We also establish that optimality to within small multiplicative constant factors can be achieved for a wide range of parallel machines. While these algorithms are fairly simple themselves, descriptions of their performance in terms of the BSP parameters is somewhat involved. The main reward of quantifying these complications is that it allows transportable software to be written for parallel machines that fit the model. We also present experimental results for the selection algorithm that reinforce our claims.
Amortization Results for Chromatic Search Trees, with an Application to Priority Queues
, 1997
"... this paper, we prove that only an amortized constant amount of rebalancing is necessary after an update in a chromatic search tree. We also prove that the amount of rebalancing done at any particular level decreases exponentially, going from the leaves toward the root. These results imply that, in p ..."
this paper, we prove that only an amortized constant amount of rebalancing is necessary after an update in a chromatic search tree. We also prove that the amount of rebalancing done at any particular level decreases exponentially, going from the leaves toward the root. These results imply that, in principle, a linear number of processes can access the tree simultaneously. We have included one interesting application of chromatic trees. Based on these trees, a priority queue with possibilities for a greater degree of parallelism than previous proposals can be implemented. ] 1997 Academic Press 1.
Concurrent Data Structures
, 2001
"... The proliferation of commercial sharedmemory multiprocessor machines has brought about significant changes in the art of concurrent programming. Given current trends towards lowcost chip multithreading (CMT), such machines are bound to become ever more widespread. Sharedmemory multiprocessors are ..."
The proliferation of commercial sharedmemory multiprocessor machines has brought about significant changes in the art of concurrent programming. Given current trends towards lowcost chip multithreading (CMT), such machines are bound to become ever more widespread. Sharedmemory multiprocessors are systems that concurrently execute multiple threads of computation which communicate and synchronize through data structures in shared memory. The efficiency of these data structures is crucial to performance, yet designing effective data structures for multiprocessor machines is an art currently mastered by few. By most accounts, concurrent data structures are far more difficult to design than sequential ones because threads executing concurrently may interleave their steps in many ways, each with a different and potentially unexpected outcome. This requires designers to modify the way they think about computation, to understand new design methodologies, and to adopt a new collection of programming tools. Furthermore, new challenges arise in designing scalable concurrent data structures that continue to perform well as machines that execute more and more concurrent threads become available. This chapter provides an overview of the challenges involved in designing concurrent data structures, and a summary of relevant work
Portable Distributed Priority Queues with MPI
, 1995
"... Part of this work has been presented in [17]. This paper analyzes the performances of portable distributed priority queues by examining the theoretical features required and by comparing various implementations. In spite of intrinsic bottlenecks and induced hotspots, we argue that tree topologies a ..."
Part of this work has been presented in [17]. This paper analyzes the performances of portable distributed priority queues by examining the theoretical features required and by comparing various implementations. In spite of intrinsic bottlenecks and induced hotspots, we argue that tree topologies are attractive to manage the natural centralized control required for the deletemin operation in order to detect the site which holds the item with the largest priority. We introduce an original perfect balancing to cope with the load variation due to the priority queue operations which continuously modify the overall number of items in the network. For comparison, we introduce the dheap and the binomial distributed priority queue. The purpose of this experiment is to convey, through executions on CrayT3D and MeikoT800, an understanding of the nature of the distributed priority queues, the range of their concurrency and a comparison of their efficiency to reduce requests latency. In particu...
Algorithms for Combinatorial Optimization in Real Time and their Automated Refinement by GeneticsBased Learning
 UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
, 1994
"... The goal of this research is to develop a systematic, integrated method of designing efficient search algorithms that solve optimization problems in real time. Search algorithms studied in this thesis comprise metacontrol and primitive search. The class of optimization problems addressed are called ..."
The goal of this research is to develop a systematic, integrated method of designing efficient search algorithms that solve optimization problems in real time. Search algorithms studied in this thesis comprise metacontrol and primitive search. The class of optimization problems addressed are called combinatorial optimization problems, examples of which include many NPhard scheduling and planning problems, and problems in operations research and artificialintelligence applications. The problems we have addressed have a welldefined problem objective and a finite set of welldefined problem constraints. In this research, we use statespace trees as problem representations. The approach we have undertaken in designing efficient search algorithms is an engineering approach and consists of two phases: (a) designing generic search algorithms, and (b) improving by geneticsbased machine learning methods parametric heuristics used in the search algorithms designed. Our approach is a systematic method that integrates domain knowledge, search techniques, and automated learning techniques for designing better search algorithms. Knowledge captured in designing one search algorithm can be carried over for designing new ones.