Results 1 - 10
of
36
Concurrent Programming Without Locks
, 2004
"... Mutual exclusion locks remain the de facto mechanism for concurrency control on shared-memory data structures. However, their apparent simplicity is deceptive: it is hard to design scalable locking strategies because locks can harbour problems such as priority inversion, deadlock and convoying. Furt ..."
Abstract
-
Cited by 64 (3 self)
- Add to MetaCart
Mutual exclusion locks remain the de facto mechanism for concurrency control on shared-memory data structures. However, their apparent simplicity is deceptive: it is hard to design scalable locking strategies because locks can harbour problems such as priority inversion, deadlock and convoying. Furthermore, scalable lock-based systems are not readily composable when building compound operations. In looking for solutions to these problems, interest has developed in nonblocking systems which have promised scalability and robustness by eschewing mutual exclusion while still ensuring safety. However, existing techniques for building non-blocking systems are rarely suitable for practical use, imposing substantial storage overheads, serialising non-conflicting operations, or requiring instructions not readily available on today’s CPUs. In this paper we present three APIs which make it easier to develop non-blocking implementations of arbitrary data structures. The first API is a multi-word compare-and-swap operation (MCAS) which atomically updates a set of memory locations. This can be used to advance a data structure from one consistent state to another. The second API is a word-based software transactional memory (WSTM) which can allow sequential code to be re-used more directly than with MCAS and which provides better scalability when locations are being read rather than being
The ATOMOS Transactional Programming Language
, 2006
"... Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional waiting constructs with simpler transactional alternatives. The Atomos watch statement al ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional waiting constructs with simpler transactional alternatives. The Atomos watch statement allows programmers to specify fine-grained watch sets used with the Atomos retry conditional waiting statement for efficient transactional conflict-driven wakeup even in transactional memory systems with a limited number of transactional contexts. Atomos supports open-nested transactions, which are necessary for building both scalable application programs and virtual machine implementations.
Enforcing isolation and ordering in STM
- In the Proceedings of the Conf. on Programming Language Design and Implementation
, 2007
"... Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can resu ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can result in unexpected, implementation-dependent behavior. To guarantee isolation and consistent ordering in such a system, programmers are expected to enclose all shared-memory accesses inside transactions. A system that provides strong atomicity guarantees isolation even in the presence of threads that access shared data outside transactions. A strongly-atomic system also orders transactions with conflicting non-transactional memory operations in a consistent manner. In this paper, we discuss some surprising pitfalls of weak atomicity, and we present an STM system that avoids these problems
Making the fast case common and the uncommon case simple in unbounded transactional memory
- In ISCA
, 2007
"... Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, allowing programmers to exploit more effectively the soon-to-be-ubiquitous multi-core designs. Several recent proposals have extended the original bounded transactional memory ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, allowing programmers to exploit more effectively the soon-to-be-ubiquitous multi-core designs. Several recent proposals have extended the original bounded transactional memory to unbounded transactional memory, a crucial step toward transactions becoming a generalpurpose primitive. Unfortunately, supporting the concurrent execution of an unbounded number of unbounded transactions is challenging, and as a result, many proposed implementations are complex. This paper explores a different approach. First, we introduce the permissions-only cache to extend the bound at which transactions overflow to allow the fast, bounded case to be used as frequently as possible. Second, we propose ONETM to simplify the implementation of unbounded transactional memory by bounding the concurrency
The Common Case Transactional Behavior of Multithreaded Programs
- In Proceedings of the 12th International Conference on High-Performance Computer Architecture
, 2006
"... Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult t ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult to understand the merits of each proposal and to tune future TM implementations to the common case behavior of real application. This work addresses this problem by analyzing the common case transactional behavior for 35 multithreaded programs from a wide range of application domains. We identify transactions within the source code by mapping existing primitives for parallelism and synchronization management to transaction boundaries. The analysis covers basic characteristics such as transaction length, distribution of readset and write-set size, and the frequency of nesting and I/O operations. The measured characteristics provide key insights into the design of efficient TM systems for both nonblocking synchronization and speculative parallelization. 1.
A Case for an Interleaving Constrained Shared-Memory Multi-Processor
- In ISCA
, 2009
"... Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, wh ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, which, however, is impractical. Many rare thread interleavings remain untested in production systems, and they are the root cause for a majority of concurrency bugs. We propose a shared-memory multiprocessor design that avoids untested interleavings to improve the correctness of a multi-threaded program. Since untested interleavings tend to occur infrequently at runtime, the performance cost of avoiding them is not high. We propose to encode the set of tested correct interleavings in a program’s binary executable using Predecessor Set (PSet) constraints. These constraints are efficiently enforced at runtime using processor support, which ensures that the runtime follows a tested interleaving. We analyze several bugs in open source applications such as MySQL, Apache, Mozilla, etc., and show that, by enforcing PSet constraints, we can avoid not only data races and atomicity violations, but also other forms of concurrency bugs.
An Integrated Hardware-Software Approach to Flexible Transactional Memory
, 2006
"... There has been considerable recent interest in both hardware and software transactional memory (TM). We present an intermediate approach, in which hardware serves to accelerate a TM implementation controlled fundamentally by software. Specifically, we describe an alert on update mechanism (AOU) that ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
There has been considerable recent interest in both hardware and software transactional memory (TM). We present an intermediate approach, in which hardware serves to accelerate a TM implementation controlled fundamentally by software. Specifically, we describe an alert on update mechanism (AOU) that allows a thread to receive fast, asynchronous notification when previously-identified lines are written by other threads, and a programmable data isolation mechanism (PDI) that allows a thread to hide its speculative writes from other threads, ignoring conflicts, until software decides to make them visible. These mechanisms reduce bookkeeping, validation, and copying overheads without constraining software policy on a host of design decisions. We have used AOU and PDI to implement a hardwareaccelerated software transactional memory system we call RTM.
Characterization of tcc on chip-multiprocessors
- In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
, 2005
"... Transactional Coherence and Consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of parallel work, synchronization, coherence, and consistency. TCC has the potential to simplify parallel program development ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Transactional Coherence and Consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of parallel work, synchronization, coherence, and consistency. TCC has the potential to simplify parallel program development and optimization by providing a smooth transition from sequential to parallel programs. In this paper, we study the implementation of TCC on chip-multiprocessors (CMPs). We explore design alternatives such as the granularity of state tracking, doublebuffering, and write-update and write-invalidate protocols. Furthermore, we characterize the performance of TCC in comparison to conventional snoopy cache coherence (SCC) using parallel applications optimized for each scheme. We conclude that the two coherence schemes perform similarly, with each scheme having a slight advantage for some applications. The bandwidth requirements of TCC are slightly higher but well within the capabilities of CMP systems. Also, we find that overflow of speculative state can be effectively handled by a simple victim cache. Our results suggest TCC can provide its programming advantages without compromising the performance expected from well-tuned parallel applications. 1.
Transactional Execution of Java Programs
, 2005
"... Parallel programming is difficult due to the complexity of dealing with conventional lock-based synchronization. To simplify parallel programming, there have been a number of proposals to support transactions directly in hardware and eliminate locks completely. Although hardware support for transact ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Parallel programming is difficult due to the complexity of dealing with conventional lock-based synchronization. To simplify parallel programming, there have been a number of proposals to support transactions directly in hardware and eliminate locks completely. Although hardware support for transactions has the potential to completely change the way parallel programs are written, initially transactions will be used to execute existing parallel programs. In this paper we investigate the implications of using transactions to execute existing parallel Java programs. Our results show that transactions can be used to support all aspects of Java multithreaded programs. Moreover, the conversion of a lock-based application into transactions is largely straightforward. The performance that these converted applications achieve is equal to or sometimes better than the original lock-based implementation.
DracoSTM: A practical C++ approach to software transactional memory
- In ACM SIGPLAN Library-Centric Software Design (LCSD
, 2007
"... Transactional memory (TM) is a recent parallel programming concept which reduces challenges found in parallel programming. TM offers numerous advantages over other synchronization mechanisms, yet many current TM systems require complex hardware, programming language extensions, specific compiler sup ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Transactional memory (TM) is a recent parallel programming concept which reduces challenges found in parallel programming. TM offers numerous advantages over other synchronization mechanisms, yet many current TM systems require complex hardware, programming language extensions, specific compiler support or enforce impractical software design, making these models unrealistic as an immediate TM solution for early adopters. Our new software transactional memory (STM) system, DracoSTM, is a high performance lock-based C++ STM research library. DracoSTM uses only native object-oriented language semantics, increasing its intuitiveness for developers while maintaining high programmability via automatic handling of composition, locks and transaction termination. DracoSTM is the STM first solution to (1) implement both direct and deferred updating and (2) enable run-time alternation between these updating policies. DracoSTM requires no language extension, specific development environment or platform, widening its usability and increasing the novelty of its design. This paper describes DracoSTM from an architectural infrastructure viewpoint. TM-specific and library-specific aspects are discussed, as are their cross-cutting design concerns. Finally, performance benchmarks are presented, showing DracoSTM outperform another high performing C++ STM library, by upwards of two orders of magnitude. 1.

