Results 1 - 10
of
335
Composable memory transactions
- In Symposium on Principles and Practice of Parallel Programming (PPoPP
, 2005
"... Atomic blocks allow programmers to delimit sections of code as ‘atomic’, leaving the language’s implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multiprocessor performance without requiring changes ..."
Abstract
-
Cited by 506 (42 self)
- Add to MetaCart
(Show Context)
Atomic blocks allow programmers to delimit sections of code as ‘atomic’, leaving the language’s implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multiprocessor performance without requiring changes to the basic structure of objects in the heap. However, these implementations perform poorly because they interpose on all accesses to shared memory in the atomic block, redirecting updates to a thread-private log which must be searched by reads in the block and later reconciled with the heap when leaving the block. This paper takes a four-pronged approach to improving performance: (1) we introduce a new ‘direct access ’ implementation that avoids searching thread-private logs, (2) we develop compiler optimizations to reduce the amount of logging (e.g. when a thread accesses the same data repeatedly in an atomic block), (3) we use runtime filtering to detect duplicate log entries that are missed statically, and (4) we present a series of GC-time techniques to compact the logs generated by long-running atomic blocks. Our implementation supports short-running scalable concurrent benchmarks with less than 50 % overhead over a non-thread-safe baseline. We support long atomic blocks containing millions of shared memory accesses with a 2.5-4.5x slowdown. Categories and Subject Descriptors D.3.3 [Programming Languages]:
Transactional Locking II
- In Proc. of the 20th Intl. Symp. on Distributed Computing
, 2006
"... Abstract. The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-ti ..."
Abstract
-
Cited by 348 (17 self)
- Add to MetaCart
(Show Context)
Abstract. The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any systems memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available. 1
Logtm: Log-based transactional memory
- in HPCA
, 2006
"... Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (r ..."
Abstract
-
Cited by 279 (11 self)
- Add to MetaCart
(Show Context)
Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (retained if the transaction aborts) values. Most (hardware) TM systems leave old values “in place” (the target memory address) and buffer new values elsewhere until commit. This makes aborts fast, but penalizes (the much more frequent) commits. In this paper, we present a new implementation of transactional memory, Log-based Transactional Memory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place. LogTM makes two additional contributions. First, LogTM extends a MOESI directory protocol to enable both fast conflict detection on evicted blocks and fast commit (using lazy cleanup). Second, LogTM handles aborts in (library) software with little performance penalty. Evaluations running micro- and SPLASH-2 benchmarks on a 32way multiprocessor support our decision to optimize for commit by showing that only 1-2 % of transactions abort. 1.
On the correctness of transactional memory
- In PPoPP
, 2008
"... Transactional memory (TM) is an appealing abstraction for programming multi-core systems. Potential target applications for TM, such as business software and video games, are likely to involve complex data structures and large transactions, requiring specific software solutions (STM). So far, howeve ..."
Abstract
-
Cited by 204 (28 self)
- Add to MetaCart
Transactional memory (TM) is an appealing abstraction for programming multi-core systems. Potential target applications for TM, such as business software and video games, are likely to involve complex data structures and large transactions, requiring specific software solutions (STM). So far, however, STMs have been mainly evaluated and optimized for smaller scale benchmarks. We revisit the main STM design choices from the perspective of complex workloads and propose a new STM, which we call SwissTM. In short, SwissTM is lock- and word-based and uses (1) optimistic (commit-time) conflict detection for read/write conflicts and pessimistic (encounter-time) conflict detection for write/write conflicts, as well as (2) a new two-phase contention manager that ensures the progress of long transactions while inducing no overhead on short ones. SwissTM outperforms state-of-theart STM implementations, namely RSTM, TL2, and TinySTM, in our experiments on STMBench7, STAMP, Lee-TM and red-black tree benchmarks. Beyond SwissTM, we present the most complete evaluation to date of the individual impact of various STM design choices on the ability to support the mixed workloads of large applications.
Hybrid transactional memory
- IN PROC. 12TH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOSXII
, 2006
"... Transactional memory (TM) promises to substantially reduce the difficulty of writing correct, efficient, and scalable concurrent programs. But “bounded ” and “best-effort” hardware TM proposals impose unreasonable constraints on programmers, while more flexible software TM implementations are consid ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Transactional memory (TM) promises to substantially reduce the difficulty of writing correct, efficient, and scalable concurrent programs. But “bounded ” and “best-effort” hardware TM proposals impose unreasonable constraints on programmers, while more flexible software TM implementations are considered too slow. Proposals for supporting “unbounded” transactions in hardware entail significantly higher complexity and risk than best-effort designs. We introduce Hybrid Transactional Memory (HyTM), an approach to implementing TM in software so that it can use best-effort hardware TM (HTM) to boost performance but does not depend on HTM. Thus programmers can develop and test transactional programs in existing systems today, and can enjoy the performance benefits of HTM support when it becomes available. We describe our prototype HyTM system, comprising a compiler and a library. The compiler allows a transaction to be attempted using best-effort HTM, and retried using the software library if it fails. We have used our prototype to “transactify ” part of the Berkeley DB system, as well as several benchmarks. By disabling the optional use of HTM, we can run all of these tests on existing systems. Furthermore, by using a simulated multiprocessor with HTM support, we demonstrate the viability of the HyTM approach: it can provide performance and scalability approaching that of an unbounded HTM implementation, without the need to support all transactions with complicated HTM support.
Concurrent Programming Without Locks
, 2004
"... Mutual exclusion locks remain the de facto mechanism for concurrency control on shared-memory data structures. However, their apparent simplicity is deceptive: it is hard to design scalable locking strategies because locks can harbour problems such as priority inversion, deadlock and convoying. Furt ..."
Abstract
-
Cited by 123 (4 self)
- Add to MetaCart
(Show Context)
Mutual exclusion locks remain the de facto mechanism for concurrency control on shared-memory data structures. However, their apparent simplicity is deceptive: it is hard to design scalable locking strategies because locks can harbour problems such as priority inversion, deadlock and convoying. Furthermore, scalable lock-based systems are not readily composable when building compound operations. In looking for solutions to these problems, interest has developed in nonblocking systems which have promised scalability and robustness by eschewing mutual exclusion while still ensuring safety. However, existing techniques for building non-blocking systems are rarely suitable for practical use, imposing substantial storage overheads, serialising non-conflicting operations, or requiring instructions not readily available on today’s CPUs. In this paper we present three APIs which make it easier to develop non-blocking implementations of arbitrary data structures. The first API is a multi-word compare-and-swap operation (MCAS) which atomically updates a set of memory locations. This can be used to advance a data structure from one consistent state to another. The second API is a word-based software transactional memory (WSTM) which can allow sequential code to be re-used more directly than with MCAS and which provides better scalability when locations are being read rather than being
Subtleties of transactional memory atomicity semantics
- IEEE Comput. Archit. Lett
, 2006
"... ..."
(Show Context)
Logtm-se: Decoupling hardware transactional memory from caches
- In High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on
, 2007
"... This paper proposes a hardware transactional memory (HTM) system called LogTM Signature Edition (LogTM-SE). LogTM-SE uses signatures to summarize a transaction’s read-and write-sets and detects conflicts on coherence requests (eager conflict detection). Transactions update memory “in place ” after s ..."
Abstract
-
Cited by 109 (15 self)
- Add to MetaCart
(Show Context)
This paper proposes a hardware transactional memory (HTM) system called LogTM Signature Edition (LogTM-SE). LogTM-SE uses signatures to summarize a transaction’s read-and write-sets and detects conflicts on coherence requests (eager conflict detection). Transactions update memory “in place ” after saving the old value in a per-thread memory log (eager version management). Finally, a transaction commits locally by clearing its signature, resetting the log pointer, etc., while aborts must undo the log. LogTM-SE achieves two key benefits. First, signatures and logs can be implemented without changes to highly-optimized cache arrays because LogTM-SE never moves cached data, changes a block’s cache state, or flash clears bits in the cache. Second, transactions are more easily virtualized because sig-natures and logs are software accessible, allowing the operating system and runtime to save and restore this state. In particu-lar, LogTM-SE allows cache victimization, unbounded nesting (both open and closed), thread context switching and migra-tion, and paging. 1
Ctrigger: exposing atomicity violation bugs from their hiding places
- In ASPLOS
, 2009
"... Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violation bugs are common and important. Existing techniques to detect atomicity violation bugs suffer from one limitation: requiring ..."
Abstract
-
Cited by 109 (16 self)
- Add to MetaCart
(Show Context)
Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violation bugs are common and important. Existing techniques to detect atomicity violation bugs suffer from one limitation: requiring bugs to manifest during monitored runs, which is an open problem in concurrent program testing. This paper makes two contributions. First, it studies the interleaving characteristics of the common practice in concurrent program testing (i.e., running a program over and over) to understand why atomicity violation bugs are hard to expose. Second, it proposes CTrigger to effectively and efficiently expose atomicity violation bugs in large programs. CTrigger focuses on a special type of interleavings (i.e., unserializable
Hybrid Transactional Memory
, 2006
"... High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they have a number of disadvantages, including possibl ..."
Abstract
-
Cited by 103 (1 self)
- Add to MetaCart
High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they have a number of disadvantages, including possibly unnecessary serialization, and possible deadlock. Transactional memory is an alternative mechanism that makes parallel programming easier. With transactional memory, a transaction provides atomic and serializable operations on an arbitrary set of memory locations. When a transaction commits, all operations within the transaction become visible to other threads. When it aborts, all operations in the transaction are rolled back. Transactional