Results 11 - 20
of
32
Revocable locks for non-blocking programming
- In PPoPP
, 2005
"... In this paper we present a new form of revocable lock that streamlines the construction of higher level concurrency abstractions such as atomic multi-word heap updates. The key idea is to expose revocation by displacing the previous lock holder’s execution to a safe address. This provides mutual exc ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper we present a new form of revocable lock that streamlines the construction of higher level concurrency abstractions such as atomic multi-word heap updates. The key idea is to expose revocation by displacing the previous lock holder’s execution to a safe address. This provides mutual exclusion without needing to block threads. This brings many simplifications, often removing the need for dynamic memory management and letting us strip operations from common-case execution paths. As well as streamlining algorithms ’ design, our results show that the technique leads to improved performance and scalability across a range of levels of contention.
An optimistic approach to lock-free fifo queues
- In Proceedings of the 18th International Symposium on Distributed Computing, LNCS 3274
, 2004
"... Abstract. First-in-first-out (FIFO) queues are among the most fundamental and highly studied concurrent data structures. The most effective and practical dynamic-memory concurrent queue implementation in the literature is the lock-free FIFO queue algorithm of Michael and Scott, included in the stand ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract. First-in-first-out (FIFO) queues are among the most fundamental and highly studied concurrent data structures. The most effective and practical dynamic-memory concurrent queue implementation in the literature is the lock-free FIFO queue algorithm of Michael and Scott, included in the standard Java TM Concurrency Package. This paper presents a new dynamic-memory lock-free FIFO queue algorithm that performs consistently better than the Michael and Scott queue. The key idea behind our new algorithm is a novel way of replacing the singly-linked list of Michael and Scott, whose pointers are inserted using a costly compare-and-swap (CAS) operation, by an “optimistic” doubly-linked list whose pointers are updated using a simple store, yet can be “fixed ” if a bad ordering of events causes them to be inconsistent. We believe it is the first example of such an “optimistic ” approach being applied to a real world data structure. 1
Performance of memory reclamation for lockless synchronization
, 2007
"... Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dyn ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dynamic data structures that avoid locking require a memory reclamation scheme that reclaims elements once they are no longer in use. The performance of existing memory reclamation schemes has not been thoroughly evaluated. We conduct the first fair and comprehensive comparison of three recent schemes—quiescent-state-based reclamation, epoch-based reclamation, and hazard-pointer-based reclamation—using a flexible microbenchmark. Our results show that there is no globally optimal scheme. When evaluating lockless synchronization, programmers and algorithm designers should thus carefully consider the data structure, the workload, and the execution environment, each of which can dramatically affect the memory reclamation performance. We discuss the consequences of our results for programmers and algorithm designers. Finally, we describe the use of one scheme, quiescentstate-based reclamation, in the context of an OS kernel—an execution environment which is well suited to this scheme.
Lock-Free and Practical Doubly Linked List-Based Deques Using Single-Word Compare-and-Swap
, 2005
"... We present an efficient and practical lock-free implementation of a concurrent deque that supports parallelism for disjoint accesses and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of deques are either based on non-available atomic syn ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We present an efficient and practical lock-free implementation of a concurrent deque that supports parallelism for disjoint accesses and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of deques are either based on non-available atomic synchronization primitives, only implement a subset of the functionality, or are not designed for disjoint accesses. Our algorithm is based on a general lock-free doubly linked list, and only requires single-word compare-and-swap atomic primitives. It also allows pointers with full precision, and thus supports dynamic deque sizes. We have performed an empirical study using full implementations of the most efficient known algorithms of lock-free deques. For systems with low concurrency, the algorithm by Michael shows the best performance. However, as our algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture. In addition, the proposed solution also implements a general doubly linked list, the first lock-free implementation that only needs the single-word compare-and-swap atomic primitive.
Allocating memory in a lock-free manner
, 2005
"... The potential of multiprocessor systems is often not fully realized by their system services. Certain synchronization methods, such as lock-based ones, may limit the parallelism. It is significant to see the impact of wait/lock-free synchronization design in key services for multiprocessor systems, ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
The potential of multiprocessor systems is often not fully realized by their system services. Certain synchronization methods, such as lock-based ones, may limit the parallelism. It is significant to see the impact of wait/lock-free synchronization design in key services for multiprocessor systems, such as the memory allocation service. Efficient, scalable memory allocators for multithreaded applications on multiprocessors is a significant goal of recent research projects. We propose a lock-free memory allocator, to enhance the parallelism in the system. Its architecture is inspired by Hoard, a successful concurrent memory allocator, with a modular, scalable design that preserves scalability and helps avoiding false-sharing and heap blowup. Within our effort on designing appropriate lock-free algorithms to construct this system, we propose a new non-blocking data structure called flat-sets, supporting conventional “internal” operations as well as “inter-object” operations, for moving items between flat-sets. We implemented the memory allocator in a set of multiprocessor systems (UMA Sun Enterprise 450 and ccNUMA Origin 3800) and studied its behaviour. The results show that the good properties of Hoard w.r.t. false-sharing and heap-blowup are preserved, while the scalability properties are enhanced even further with the help of lock-free synchronization.
ABA prevention using single-word instructions
- IBM T. J. Watson Research Center
, 2004
"... been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P.
Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams
"... Many real-world data stream analysis applications such as network monitoring, click stream analysis, and others require combining multiple streams of data arriving from multiple sources. This is referred to as multi-stream analysis. To deal with high stream arrival rates, it is desirable that such s ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Many real-world data stream analysis applications such as network monitoring, click stream analysis, and others require combining multiple streams of data arriving from multiple sources. This is referred to as multi-stream analysis. To deal with high stream arrival rates, it is desirable that such systems be capable of supporting very high processing throughput. The advent of multicore processors and powerful servers driven by these processors calls for efficient parallel designs that can effectively utilize the parallelism of the multicores, since performance improvement is possible only through effective parallelism. In this paper, we address the problem of parallelizing multi-stream analysis in the context of multicore processors. Specifically, we concentrate on parallelizing frequent elements, top-k, and frequency counting over multiple streams. We discuss the challenges in designing an efficient parallel system for multi-stream processing. Our evaluation and analysis reveals that traditional “contention ” based locking results in excessive overhead and wait, which in turn leads to severe performance degradation in modern multicore architectures. Based on our analysis, we propose a “cooperation ” based locking paradigm for efficient parallelization of frequency counting. The proposed “cooperation ” based paradigm removes waits associated with synchronization, and allows replacing locks by much cheaper atomic synchronization primitives. Our implementation of the proposed paradigm to parallelize a well known frequency counting algorithm shows the benefits of the proposed “cooperation ” based locking paradigm when compared to the traditional “contention” based locking paradigm. In our experiments, the proposed “cooperation” based design outperforms the traditional “contention ” based design by a factor of 2 − 5.5X for synthetic zipfian data sets. 1.
Efficient and reliable lock-free memory reclamation based on reference counting
- In Proc. 8th I-SPAN
, 2005
"... We present an efficient and practical lock-free implementation of a memory reclamation scheme based on reference counting, aimed for use with arbitrary lock-free dynamic data structures. The scheme guarantees the safety of local as well as global references, supports arbitrary memory reuse, uses ato ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present an efficient and practical lock-free implementation of a memory reclamation scheme based on reference counting, aimed for use with arbitrary lock-free dynamic data structures. The scheme guarantees the safety of local as well as global references, supports arbitrary memory reuse, uses atomic primitives which are available in modern computer systems and provides an upper bound on the memory prevented for reuse. To the best of our knowledge, this is the first lock-free algorithm that provides all of these properties. Experimental results indicate significant performance improvements for lock-free algorithms of dynamic data structures that require strong garbage collection support. 1.
Making lockless synchronization fast: Performance implications of memory reclamation
- In 2006 International Parallel and Distributed Processing Symposium (IPDPS 2006
, 2006
"... Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dyn ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dynamic data structures that avoid locking, require a memory reclamation scheme that reclaims nodes once they are no longer in use. The performance of existing memory reclamation schemes has not been thoroughly evaluated. We conduct the first fair and comprehensive comparison of three recent schemes—quiescent-state-based reclamation, epoch-based reclamation, and hazard-pointer-based reclamation—using a flexible microbenchmark. Our results show that there is no globally optimal scheme. When evaluating lockless synchronization, programmers and algorithm designers should thus carefully consider the data structure, the workload, and the execution environment, each of which can dramatically affect memory reclamation performance. 1
Brief announcement: Dynamic-sized lock-free data structures
- In To appear in Proceedings of the Twenty-First Symposium on Principles of Distributed Computing
, 2002
"... Almost all previous dynamic-sized lock-free data structures are either unable to free memory to the memory allocator when it is no longer required, or require special system or hardware support. In the only exception we are aware of, a single thread failure can prevent further memory reclamation (se ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Almost all previous dynamic-sized lock-free data structures are either unable to free memory to the memory allocator when it is no longer required, or require special system or hardware support. In the only exception we are aware of, a single thread failure can prevent further memory reclamation (see full paper for reference). We recently posed a problem — the Repeat Offenders Problem (ROP) — and presented one solution and its correctness proof [3]. Solutions to this problem can be used to design dynamic-sized lockfree implementations of shared data structures that overcome all of the problems mentioned above. In the full paper [2], we present two results. The first is a general methodology, based on any ROP solution, for transforming dynamic-sized lock-free data structure implementations that depend on garbage collection (GC) for memory management into equivalent ones that do not require GC. This methodology is based on reference counts, and therefore entails space and time overhead required to maintain reference counts. (This methodology improves on the one presented in [1] by removing the dependence on double compare-and-swap (DCAS).) ROP can also be applied directly to achieve more efficient (both in space and time) implementations of dynamic-sized lock-free data structures. In the full paper, we give an example to demonstrate this approach. Specifically, we show how to modify the widely-used lock-free FIFO queue implementation of Michael and Scott so that it can free memory to the memory allocator when it is no longer required. We also present the results of performance experiments

