Results 1 - 10
of
15
The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures
- In Proceedings of the 16th International Symposium on Distributed Computing
, 2002
"... We define the Repeat Offender Problem (ROP). Elsewhere, we have presented the first dynamic-sized lock-free data structures that can free memory to any standard memory allocator—even after thread failures—without requiring special support from the operating system, the memory allocator, or the hardw ..."
Abstract
-
Cited by 44 (10 self)
- Add to MetaCart
We define the Repeat Offender Problem (ROP). Elsewhere, we have presented the first dynamic-sized lock-free data structures that can free memory to any standard memory allocator—even after thread failures—without requiring special support from the operating system, the memory allocator, or the hardware. These results depend on a solution to the ROP problem. Here we present the first solution to the ROP problem and its correctness proof. Our solution is implementable in most modern shared memory multiprocessors. M/S MTV29-01
DCAS is not a Silver Bullet for Nonblocking Algorithm Design
- In SPAA ’04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
, 2004
"... Despite years of research, the design of efficient nonblocking algorithms remains difficult. A key reason is that current shared-memory multiprocessor architectures support only single-location synchronisation primitives such as compareand-swap (CAS) and load-linked/store-conditional (LL/SC). Recent ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Despite years of research, the design of efficient nonblocking algorithms remains difficult. A key reason is that current shared-memory multiprocessor architectures support only single-location synchronisation primitives such as compareand-swap (CAS) and load-linked/store-conditional (LL/SC). Recently researchers have investigated the utility of doublecompare-and-swap (DCAS)—a generalisation of CAS that supports atomic access to two memory locations—in overcoming these problems. We summarise recent research in this direction and present a detailed case study concerning a previously published nonblocking DCAS-based doubleended queue implementation. Our summary and case study clearly show that DCAS does not provide a silver bullet for nonblocking synchronisation. That is, it does not make the design and verification of even mundane nonblocking data structures with desirable properties easy. Therefore, our position is that while slightly more powerful synchronisation primitives can have a profound effect on ease of algorithm design and verification, DCAS does not provide sufficient additional power over CAS to justify supporting it in hardware.
Nonblocking memory management support for dynamic-sized data structures
- ACM Trans. Comput. Syst
, 2005
"... Conventional dynamic memory management methods interact poorly with lock-free synchronization. In this article, we introduce novel techniques that allow lock-free data structures to allocate and free memory dynamically using any thread-safe memory management library. Our mechanisms are lock-free in ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Conventional dynamic memory management methods interact poorly with lock-free synchronization. In this article, we introduce novel techniques that allow lock-free data structures to allocate and free memory dynamically using any thread-safe memory management library. Our mechanisms are lock-free in the sense that they do not allow a thread to be prevented from allocating or freeing memory by the failure or delay of other threads. We demonstrate the utility of these techniques by showing how to modify the lock-free FIFO queue implementation of Michael and Scott to free unneeded memory. We give experimental results that show that the overhead introduced by such modifications is moderate, and is negligible under low contention.
Dynamic-sized lockfree data structures
, 2002
"... We address the problem of integrating lockfree shared data structures with standard dynamic allocation mechanisms (such as malloc and free). We have two main contributions. The first is the design and experimental analysis of two dynamic-sized lockfree FIFO queue implementations, which extend Michae ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We address the problem of integrating lockfree shared data structures with standard dynamic allocation mechanisms (such as malloc and free). We have two main contributions. The first is the design and experimental analysis of two dynamic-sized lockfree FIFO queue implementations, which extend Michael and Scott’s previous implementation by allowing unused memory to be freed. We compare our dynamic-sized implementations to the original on 16-processor and 64-processor multiprocessors. Our experimental results indicate that the performance penalty for making the queue dynamic-sized is modest, and is negligible when contention is not too high. These results were achieved by applying a solution to the Repeat Offender Problem (ROP), which we recently posed and solved. Our second contribution is another application of ROP solutions. Specifically, we show how to use any ROP solution to achieve a general methodology for transforming lockfree data structures that rely on garbage collection into ones that use explicit storage reclamation.
Lock-Free and Practical Deques using Single-Word Compare-And-Swap
, 2004
"... We present an efficient and practical lock-free implementation of a concurrent deque that is disjoint-parallel accessible and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of deques are either based on non-available atomic synchronizatio ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We present an efficient and practical lock-free implementation of a concurrent deque that is disjoint-parallel accessible and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of deques are either based on non-available atomic synchronization primitives, only implement a subset of the functionality, or are not designed for disjoint accesses. Our algorithm is based on a doubly linked list, and only requires single-word compare-and-swap atomic primitives, even for dynamic memory sizes. We have performed an empirical study using full implementations of the most efficient algorithms of lock-free deques known. For systems with low concurrency, the algorithm by Michael shows the best performance. However, as our algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture. 1
Efficient and reliable lock-free memory reclamation based on reference counting
- In Proc. 8th I-SPAN
, 2005
"... We present an efficient and practical lock-free implementation of a memory reclamation scheme based on reference counting, aimed for use with arbitrary lock-free dynamic data structures. The scheme guarantees the safety of local as well as global references, supports arbitrary memory reuse, uses ato ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present an efficient and practical lock-free implementation of a memory reclamation scheme based on reference counting, aimed for use with arbitrary lock-free dynamic data structures. The scheme guarantees the safety of local as well as global references, supports arbitrary memory reuse, uses atomic primitives which are available in modern computer systems and provides an upper bound on the memory prevented for reuse. To the best of our knowledge, this is the first lock-free algorithm that provides all of these properties. Experimental results indicate significant performance improvements for lock-free algorithms of dynamic data structures that require strong garbage collection support. 1.
Making lockless synchronization fast: Performance implications of memory reclamation
- In 2006 International Parallel and Distributed Processing Symposium (IPDPS 2006
, 2006
"... Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dyn ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dynamic data structures that avoid locking, require a memory reclamation scheme that reclaims nodes once they are no longer in use. The performance of existing memory reclamation schemes has not been thoroughly evaluated. We conduct the first fair and comprehensive comparison of three recent schemes—quiescent-state-based reclamation, epoch-based reclamation, and hazard-pointer-based reclamation—using a flexible microbenchmark. Our results show that there is no globally optimal scheme. When evaluating lockless synchronization, programmers and algorithm designers should thus carefully consider the data structure, the workload, and the execution environment, each of which can dramatically affect memory reclamation performance. 1
Brief announcement: Dynamic-sized lock-free data structures
- In To appear in Proceedings of the Twenty-First Symposium on Principles of Distributed Computing
, 2002
"... Almost all previous dynamic-sized lock-free data structures are either unable to free memory to the memory allocator when it is no longer required, or require special system or hardware support. In the only exception we are aware of, a single thread failure can prevent further memory reclamation (se ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Almost all previous dynamic-sized lock-free data structures are either unable to free memory to the memory allocator when it is no longer required, or require special system or hardware support. In the only exception we are aware of, a single thread failure can prevent further memory reclamation (see full paper for reference). We recently posed a problem — the Repeat Offenders Problem (ROP) — and presented one solution and its correctness proof [3]. Solutions to this problem can be used to design dynamic-sized lockfree implementations of shared data structures that overcome all of the problems mentioned above. In the full paper [2], we present two results. The first is a general methodology, based on any ROP solution, for transforming dynamic-sized lock-free data structure implementations that depend on garbage collection (GC) for memory management into equivalent ones that do not require GC. This methodology is based on reference counts, and therefore entails space and time overhead required to maintain reference counts. (This methodology improves on the one presented in [1] by removing the dependence on double compare-and-swap (DCAS).) ROP can also be applied directly to achieve more efficient (both in space and time) implementations of dynamic-sized lock-free data structures. In the full paper, we give an example to demonstrate this approach. Specifically, we show how to modify the widely-used lock-free FIFO queue implementation of Michael and Scott so that it can free memory to the memory allocator when it is no longer required. We also present the results of performance experiments
NBMALLOC: Allocating Memory in a Lock-Free Manner
- ALGORITHMICA
, 2009
"... Efficient, scalable memory allocation for multithreaded applications on multiprocessors is a significant goal of recent research. In the distributed computing literature it has been emphasized that lock-based synchronization and concurrency-control may limit the parallelism in multiprocessor system ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Efficient, scalable memory allocation for multithreaded applications on multiprocessors is a significant goal of recent research. In the distributed computing literature it has been emphasized that lock-based synchronization and concurrency-control may limit the parallelism in multiprocessor systems. Thus, system services that employ such methods can hinder reaching the full potential of these systems. A natural research question is the pertinence and the impact of lock-free concurrency control in key services for multiprocessors, such as in the memory allocation service, which is the theme of this work. We show the design and implementation of NBMALLOC, a lock-free memory allocator designed to enhance the parallelism in the system. The architecture of NBMALLOC is inspired by Hoard, a well-known concurrent memory allocator, with modular design that preserves scalability and helps avoiding false-sharing and heap-blowup. Within our effort to design appropriate lockfree algorithms for NBMALLOC, we propose and show a lock-free implementation of a new data structure, flat-set, supporting conventional “internal” set operations as well as “inter-object ” operations, for moving items between flat-sets. The design of NBMALLOC also involved a series of other algorithmic problems, which are discussed in the paper. Further, we present the implementation of NBMALLOC and a
Generic〈Programming〉: Lock-Free Data Structures
- C++ Users Journal
, 2004
"... After Generic〈Programming 〉 has skipped one instance (it’s quite naïve, I know, to think that grad school asks for anything less than 100 % of one’s time), there has been an embarrassment of riches as far as topic candidates for this article go. One topic candidate was a discussion of constructors, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
After Generic〈Programming 〉 has skipped one instance (it’s quite naïve, I know, to think that grad school asks for anything less than 100 % of one’s time), there has been an embarrassment of riches as far as topic candidates for this article go. One topic candidate was a discussion of constructors, in particular forwarding constructors, handling exceptions, and two-stage object construction. One other topic candidate—and another glimpse into the Yaslander technology [2]—was creating containers (such as lists, vectors, or maps) of incomplete types, something that is possible with the help of an interesting set of tricks, but not guaranteed by the standard containers. While both candidates were interesting, they couldn’t stand a chance against lock-free data structures, which are all the rage in the multithreaded programming community. At this year’s PLDI conference

