Results 1 -
7 of
7
A Practical Multi-Word Compare-and-Swap Operation
- In Proceedings of the 16th International Symposium on Distributed Computing
, 2002
"... Work on non-blocking data structures has proposed extending processor designs with a compare-and-swap primitive, CAS2, which acts on two arbitrary memory locations. Experience suggested that current operations, typically single-word compare-and-swap (CAS1), are not expressive enough to be used alone ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
Work on non-blocking data structures has proposed extending processor designs with a compare-and-swap primitive, CAS2, which acts on two arbitrary memory locations. Experience suggested that current operations, typically single-word compare-and-swap (CAS1), are not expressive enough to be used alone in an efficient manner. In this paper we build CAS2 from CAS1 and, in fact, build an arbitrary multi-word compare-and-swap (CASN). Our design requires only the primitives available on contemporary systems, reserves a small and constant amount of space in each word updated (either 0 or 2 bits) and permits nonoverlapping updates to occur concurrently. This provides compelling evidence that current primitives are not only universal in the theoretical sense introduced by Herlihy, but are also universal in their use as foundations for practical algorithms. This provides a straightforward mechanism for deploying many of the interesting non-blocking data structures presented in the literature that have previously required CAS2.
Two-Handed Emulation: How to build non-blocking implementations of complex data-structures using DCAS
- In Proceedings of the 21st Annual Symposium on Principles of Distributed Computing
, 2002
"... This paper partly addresses the question of whether, in principle, there is any point in adding richer hardware synchronization primitives when the existing set is \universal", and therefore sucient to synchronize any data structure in a non-blocking manner. The context of this paper is the ongoing ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
This paper partly addresses the question of whether, in principle, there is any point in adding richer hardware synchronization primitives when the existing set is \universal", and therefore sucient to synchronize any data structure in a non-blocking manner. The context of this paper is the ongoing investigation of the utility of adding a DCAS instruction to modern processors to aid the design and performance of non-blocking algorithms. We add one more piece of evidence in support of this instruction.
CAS-based lock-free algorithm for shared deques
- In the 9th Euro-Par Conference on Parallel Processing
, 2003
"... Abstract. This paper presents the first lock-free algorithm for shared double-ended queues (deques) based on the single-address atomic primitives CAS (Compare-and-Swap) or LL/SC (Load-Linked and Store-Conditional). The algorithm can use single-word primitives, if the maximum deque size is static. To ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract. This paper presents the first lock-free algorithm for shared double-ended queues (deques) based on the single-address atomic primitives CAS (Compare-and-Swap) or LL/SC (Load-Linked and Store-Conditional). The algorithm can use single-word primitives, if the maximum deque size is static. To allow the deque’s size to be dynamic, the algorithm employs single-address double-width primitives. Prior lockfree algorithms for shared deques depend on the strong DCAS (Double-Compare-and-Swap) atomic primitive, not supported on most processor architectures. The new algorithm offers significant advantages over prior lock-free shared deque algorithms with respect to performance and the strength of required primitives. In turn, lock-free algorithms provide significant reliability and performance advantages over lock-based implementations. 1
Lock-free Dynamically Resizable Arrays
"... Abstract. We present a first lock-free design and practical implementation of a dynamically resizable array (vector). The most extensively used container in the C++ Standard Library is vector, offering a combination of dynamic memory management and efficient random access. Our approach is based on a ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Abstract. We present a first lock-free design and practical implementation of a dynamically resizable array (vector). The most extensively used container in the C++ Standard Library is vector, offering a combination of dynamic memory management and efficient random access. Our approach is based on a single 32-bit word atomic compare-and-swap (CAS) instruction and our implementation is portable to all systems supporting CAS, and more. It provides a flexible, generic, linearizable and highly parallelizable STL like interface, effective lock-free memory allocation and management, and fast execution. Our current implementation is designed to be most efficient on the most recent multi-core architectures. The test cases on a dual-core Intel processor indicate that our lock-free vector outperforms its lock-based STL counterpart and the latest concurrent vector implementation provided by Intel by a factor of 10. The implemented approach is also applicable across a variety of symmetric multiprocessing (SMP) platforms. The performance evaluation on an 8-way AMD system with non-shared L2 cache demonstrated timing results comparable to the best available lock-based techniques for such systems. The presented design implements the most common STL vector’s interfaces, namely random access read and write, tail insertion and deletion, pre-allocation of memory, and query of the container’s size. Keywords: lock-free, STL, C++, vector, concurrency, real-time systems 1
Atomic Instructions in Java
- In Magnusson [14
, 2002
"... Atomic instructions atomically access and update one or more memory locations. Because they do not incur the overhead of lock acquisition or suspend the executing thread during contention, they may allow higher levels of concurrency on multiprocessors than lock-based synchronization. Wait-free d ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Atomic instructions atomically access and update one or more memory locations. Because they do not incur the overhead of lock acquisition or suspend the executing thread during contention, they may allow higher levels of concurrency on multiprocessors than lock-based synchronization. Wait-free data structures are an important application of atomic instructions, and extend these performance benefits to higher level abstractions such as queues. In type-unsafe languages such as C, atomic instructions can be expressed in terms of operations on memory addresses. However, type-safe languages such as Java do not allow manipulation of arbitrary memory locations. Adding support for atomic instructions to Java is an interesting but important challenge.
Built-in coloring for highly-concurrent doubly-linked lists (Extended Abstract)
, 2006
"... This paper presents a novel approach for lock-free implementations of concurrent data structures, based on dynamically maintaining a coloring of the data structure’s items. Roughly speaking, the data structure’s operations are implemented by acquiring virtual locks on several items of the data stru ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper presents a novel approach for lock-free implementations of concurrent data structures, based on dynamically maintaining a coloring of the data structure’s items. Roughly speaking, the data structure’s operations are implemented by acquiring virtual locks on several items of the data structure and then making the changes atomically; this simpli£es the design and provides clean functionality. The virtual locks are managed with CAS or DCAS primitives, and helping is used to guarantee progress; virtual locks are acquired according to a coloring order that decreases the length of waiting chains and increases concurrency. Coming back full circle, the legality of the coloring is preserved by having operations correctly update the colors of the items they modify. The bene£ts of the scheme are demonstrated with new nonblocking implementations of doubly-linked list data structures: A DCAS-based implementation of a doubly-linked list allowing insertions and removals anywhere, and CAS-based implementations in which removals are allowed only at the ends of the list (insertions can occur anywhere). The implementations possess several attractive features: they do not bound the list size, they do not leave accessible chains of garbage nodes, and they allow operations to proceed concurrently, without interfering with each other, if they are applied to non-adjacent nodes in the list.
Reliable and Efficient Concurrent Synchronization for Embedded Real-Time Software
"... The high degree of autonomy and increased complexity of future robotic spacecraft pose significant challenges in assuring their reliability and efficiency. To achieve fast and safe concurrent interactions in mission critical code, we survey the practical state-of-the-art nonblocking programming tech ..."
Abstract
- Add to MetaCart
The high degree of autonomy and increased complexity of future robotic spacecraft pose significant challenges in assuring their reliability and efficiency. To achieve fast and safe concurrent interactions in mission critical code, we survey the practical state-of-the-art nonblocking programming techniques. We study in detail two nonblocking approaches: (1) CAS-based algorithms and (2) Software Transactional Memory. We evaluate the strengths and weaknesses of each approach by applying each methodology for engineering the design and implementation of a nonblocking shared vector. Our study investigates how the application of nonblocking synchronization can help eliminate the problems of deadlock, livelock, and priority inversion and at the same time deliver a performance improvement in embedded real-time software. 1

