Results 1 -
8 of
8
Transactional memory coherence and consistency
- In ISCA
, 2004
"... In this paper, we propose a new shared memory model: Transactional ..."
Abstract
-
Cited by 138 (13 self)
- Add to MetaCart
In this paper, we propose a new shared memory model: Transactional
Programming with transactional coherence and consistency (tcc
- In ASPLOS-XI: Proceedings of the 11th international conference on Architectural
, 2004
"... Transactional Coherence and Consistency (TCC) offers a way to simplify parallel programming by executing all code within transactions. In TCC systems, transactions serve as the fundamental unit of parallel work, communication and coherence. As each transaction completes, it writes all of its newly p ..."
Abstract
-
Cited by 64 (9 self)
- Add to MetaCart
Transactional Coherence and Consistency (TCC) offers a way to simplify parallel programming by executing all code within transactions. In TCC systems, transactions serve as the fundamental unit of parallel work, communication and coherence. As each transaction completes, it writes all of its newly produced state to shared memory atomically, while restarting other processors that have speculatively read stale data. With this mechanism, a TCCbased system automatically handles data synchronization correctly, without programmer intervention. To gain the benefits of TCC, programs must be decomposed into transactions. We describe two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism. With these constructs, writing correct parallel programs requires only small, incremental changes to correct sequential programs. The performance of these programs may then easily be optimized, based on feedback from real program execution, using a few simple techniques.
The Common Case Transactional Behavior of Multithreaded Programs
- In Proceedings of the 12th International Conference on High-Performance Computer Architecture
, 2006
"... Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult t ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult to understand the merits of each proposal and to tune future TM implementations to the common case behavior of real application. This work addresses this problem by analyzing the common case transactional behavior for 35 multithreaded programs from a wide range of application domains. We identify transactions within the source code by mapping existing primitives for parallelism and synchronization management to transaction boundaries. The analysis covers basic characteristics such as transaction length, distribution of readset and write-set size, and the frequency of nesting and I/O operations. The measured characteristics provide key insights into the design of efficient TM systems for both nonblocking synchronization and speculative parallelization. 1.
SMTp: An Architecture for Next-generation Scalable Multi-threading
- In ISCA’04
, 2004
"... We introduce the SMTp architecture—an SMT processor augmented with a coherence protocol thread context, that together with a standard integrated memory controller can enable the design of (among other possibilities) scalable cache-coherent hardware distributed shared memory (DSM) machines from commo ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We introduce the SMTp architecture—an SMT processor augmented with a coherence protocol thread context, that together with a standard integrated memory controller can enable the design of (among other possibilities) scalable cache-coherent hardware distributed shared memory (DSM) machines from commodity nodes. We describe the minor changes needed to a conventional out-of-order multithreaded core to realize SMTp, discussing issues related to both deadlock avoidance and performance. We then compare SMTp performance to that of various conventional DSM machines with normal SMT processors both with and without integrated memory controllers. On configurations from 1 to 32 nodes, with 1 to 4 application threads per node, we find that SMTp delivers performance comparable to, and sometimes better than, machines with more complex integrated DSM-specific memory controllers. Our results also show that the protocol thread has extremely low pipeline overhead. Given the simplicity and the flexibility of the SMTp mechanism, we argue that next-generation multithreaded processors with integrated memory controllers should adopt this mechanism as a way of building less complex high-performance DSM multiprocessors. 1.
Transactional memory coherence and consistency (TCC)
- IN PROCEEDINGS OF THE 11TH INTL. SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA
, 2004
"... The Transactional memory Coherence and Consistency (TCC) provides a shared memory model in which atomic transactions are always the basic unit of parallel work, communication, memory coherence, and memory reference consistency. TCC greatly simplifies parallel software by eliminating the need for syn ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The Transactional memory Coherence and Consistency (TCC) provides a shared memory model in which atomic transactions are always the basic unit of parallel work, communication, memory coherence, and memory reference consistency. TCC greatly simplifies parallel software by eliminating the need for synchronization using conventional locks and semaphores, along with their complexities. TCC hardware must combine all writes from each transaction region in a program into a single packet and broadcast this packet to the permanent shared memory state atomically as a large block. This simplifies the coherence hardware because it reduces the need for small, low-latency messages and completely eliminates the need for conventional snoopy cache coherence protocols, as multiple speculatively written versions of a cache line may safely coexist within the system. Meanwhile, automatic, hardwarecontrolled rollback of speculative transactions resolves any correctness violations that may occur when several processors attempt to read and write the same data simultaneously. The cost of this simplified scheme is the higher interprocessor bandwidth.
A Case for Multi-Level Main Memory
- In Proc. of the ACM Workshop on Memory Performance Issues
, 2004
"... Current trends suggest that the number of memory chips per processor chip will increase at least a factor of ten in seven years. This will make DRAM cost, the space and the power it consumes a serious problem. The main question raised in this research is how cost, size, and power consumption can be ..."
Abstract
- Add to MetaCart
Current trends suggest that the number of memory chips per processor chip will increase at least a factor of ten in seven years. This will make DRAM cost, the space and the power it consumes a serious problem. The main question raised in this research is how cost, size, and power consumption can be reduced by transforming traditional flat main-memory systems into a multi-level hierarchy.

