Results 1 - 10
of
58
Non-blocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1998
"... Most multiprocessors are multiprogrammed in order to achieve acceptable response time and to increase their uti-lization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two pri ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
Most multiprocessors are multiprogrammed in order to achieve acceptable response time and to increase their uti-lization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two principal strategies for concurrent, atomic update of shared data structures: (1) preemption-safe locking and (2) non-blocking (lock-free) algorithms. Preemption-safe locking requires kernel support. Non-blocking algorithms generally require a universal atomic primitive such as compare-and-swap orload-linked/store-conditional, and are widely regarded as inefficient. We evaluate the performance of preemption-safe lock-based and non-blocking implementations of important data structures—queues, stacks, heaps, and counters—including non-blocking and lock-based queue algorithms of our own, in micro-benchmarks and real applications on a 12-processor SGI Challenge multiprocessor. Our results indicate that our non-blocking queue consistently outperforms the best known alternatives, and that data-structure-specific non-blocking algorithms, which exist for queues, stacks, and counters, can work extremely well. Not only do they outperform preemption-safe lock-based algorithms on multiprogrammed machines, they also outperform ordinary locks on dedicated machines. At the same time, since general-purpose non-blocking techniques do not yet appear to be practical, preemption-safe locks remain the preferred alternative for complex data structures: they outperform
Threads cannot be implemented as a library
- In PLDI
, 2005
"... threads, library, register promotion, compiler optimization, garbage collection In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a ge ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
threads, library, register promotion, compiler optimization, garbage collection In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code. We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure library-based approach seems incapable of expressing an efficient parallel algorithm. Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.
Implementing Lock-Free Queues
- In Proceedings of the Seventh International Conference on Parallel and Distributed Computing Systems, Las Vegas, NV
, 1994
"... We study practical techniques for implementing the FIFO queue abstract data type using lock-free data structures, which synchronize the operations of concurrent processes without the use of mutual exclusion. Two new algorithms based on linked lists and arrays are presented. We also propose a new sol ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
We study practical techniques for implementing the FIFO queue abstract data type using lock-free data structures, which synchronize the operations of concurrent processes without the use of mutual exclusion. Two new algorithms based on linked lists and arrays are presented. We also propose a new solution to the ABA problem associated with the Compare&Swap instruction. The performance of our linked list algorithm is compared several other lock-free queue implementations, as well as more conventional locking techniques. 1 Introduction Concurrent access to a data structure shared among several processes must be synchronized in order to avoid conflicting updates. Conventionally this is done using mutual exclusion; processes modify the data structure only inside a critical section of code, within which the process is guaranteed exclusive access to the data structure. Typically, on a multiprocessor, critical sections are guarded with a spin-lock . We will refer to all methods using mutual ...
The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures
- In Proceedings of the 16th International Symposium on Distributed Computing
, 2002
"... We define the Repeat Offender Problem (ROP). Elsewhere, we have presented the first dynamic-sized lock-free data structures that can free memory to any standard memory allocator—even after thread failures—without requiring special support from the operating system, the memory allocator, or the hardw ..."
Abstract
-
Cited by 44 (10 self)
- Add to MetaCart
We define the Repeat Offender Problem (ROP). Elsewhere, we have presented the first dynamic-sized lock-free data structures that can free memory to any standard memory allocator—even after thread failures—without requiring special support from the operating system, the memory allocator, or the hardware. These results depend on a solution to the ROP problem. Here we present the first solution to the ROP problem and its correctness proof. Our solution is implementable in most modern shared memory multiprocessors. M/S MTV29-01
Pragmatic Nonblocking Synchronization for Real-Time Systems
, 2001
"... We present a pragmatic methodology for designing nonblocking real-time systems. Our methodology uses a combination of lock-free and wait-free synchronization techniques and clearly states which technique should be applied in which situation. ..."
Abstract
-
Cited by 36 (12 self)
- Add to MetaCart
We present a pragmatic methodology for designing nonblocking real-time systems. Our methodology uses a combination of lock-free and wait-free synchronization techniques and clearly states which technique should be applied in which situation.
A scalable lock-free stack algorithm
- In SPAA’04: Symposium on Parallelism in Algorithms and Architectures
, 2004
"... The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well o ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lock-free linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is non-blocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lock-free stack is lock-free, linearizable, and scalable. As our empirical results show, the resulting eliminationbackoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lock-based and non-blocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.
Two-Handed Emulation: How to build non-blocking implementations of complex data-structures using DCAS
- In Proceedings of the 21st Annual Symposium on Principles of Distributed Computing
, 2002
"... This paper partly addresses the question of whether, in principle, there is any point in adding richer hardware synchronization primitives when the existing set is \universal", and therefore sucient to synchronize any data structure in a non-blocking manner. The context of this paper is the ongoing ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
This paper partly addresses the question of whether, in principle, there is any point in adding richer hardware synchronization primitives when the existing set is \universal", and therefore sucient to synchronize any data structure in a non-blocking manner. The context of this paper is the ongoing investigation of the utility of adding a DCAS instruction to modern processors to aid the design and performance of non-blocking algorithms. We add one more piece of evidence in support of this instruction.
Comparison under abstraction for verifying linearizability
- In 19th International Conference on Computer Aided Verification (CAV
, 2007
"... Abstract. Linearizability is one of the main correctness criteria for implementations of concurrent data structures. A data structure is linearizable if its operations appear to execute atomically. Verifying linearizability of concurrent unbounded linked data structures is a challenging problem beca ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
Abstract. Linearizability is one of the main correctness criteria for implementations of concurrent data structures. A data structure is linearizable if its operations appear to execute atomically. Verifying linearizability of concurrent unbounded linked data structures is a challenging problem because it requires correlating executions that manipulate (unbounded-size) memory states. We present a static analysis for verifying linearizability of concurrent unbounded linked data structures. The novel aspect of our approach is the ability to prove that two (unboundedsize) memory layouts of two programs are isomorphic in the presence of abstraction. A prototype implementation of the analysis verified the linearizability of several published concurrent data structures implemented by singly-linked lists. 1
Hardware Acceleration of Software Transactional Memory
- DEPT. OF COMPUTER SCIENCE, UNIV. OF ROCHESTER
, 2006
"... Transactional memory (TM) systems seek to increase scalability, reduce programming complexity, and overcome the various semantic problems associated with locks. Software TM proposals run on stock processors and provide substantial flexibility in policy, but incur significant overhead for data versio ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
Transactional memory (TM) systems seek to increase scalability, reduce programming complexity, and overcome the various semantic problems associated with locks. Software TM proposals run on stock processors and provide substantial flexibility in policy, but incur significant overhead for data versioning and validation in the face of conflicting transactions. Hardware TM proposals have the advantage of speed, but are typically highly ambitious, embed significant amounts of policy in silicon, and provide no clear migration path for software that must also run on legacy machines. We advocate an intermediate approach, in which hardware is used to accelerate a TM implementation controlled fundamentally by software. We present a system, RTM, that embodies this approach. It consists of a novel transactional MESI (TMESI) protocol and accompanying TM software. TMESI eliminates the key software overheads of data copying, garbage collection, and validation, without introducing any global consensus algorithm in the cache coherence protocol (a commit is allowed to perform using only a few cycles of completely local operation). The only change to the snooping interface is a “threatened” signal analogous to the existing “shared” signal. By leaving policy to software, RTM allows us to experiment with a wide variety of policies for contention management, deadlock and livelock avoidance, data granularity, nesting, and virtualization.
Shape-value abstraction for verifying linearizability
, 2008
"... Abstract. This paper presents a novel abstraction for heap-allocated data structures that keeps track of both their shape and their contents. By combining this abstraction with thread-local analysis and relyguarantee reasoning, we can verify a collection of fine-grained blocking and non-blocking con ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Abstract. This paper presents a novel abstraction for heap-allocated data structures that keeps track of both their shape and their contents. By combining this abstraction with thread-local analysis and relyguarantee reasoning, we can verify a collection of fine-grained blocking and non-blocking concurrent algorithms for an arbitrary (unbounded) number of threads. We prove that these algorithms are linearizable, namely equivalent (modulo termination) to their sequential counterparts. 1

