Results 1 -
9 of
9
Advanced concurrency control for transactional memory using transaction commit rate
- In EUROPAR ’08: Fourteenth European Conference on Parallel Processing
, 2008
"... Abstract. Concurrency control for Transactional Memory (TM) is investigated as a means for improving resource usage by adjusting dynamically the number of threads concurrently executing transactions. The proposed control system takes as feedback the measured Transaction Commit Rate to adjust the con ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. Concurrency control for Transactional Memory (TM) is investigated as a means for improving resource usage by adjusting dynamically the number of threads concurrently executing transactions. The proposed control system takes as feedback the measured Transaction Commit Rate to adjust the concurrency. Through an extensive evaluation, a new Concurrency Control Algorithm (CCA), called P-only Concurrency Control (PoCC), is shown to perform better than our other four proposed CCAs for a synthetic benchmark, and the STAMP and Lee-TM benchmarks. 1
An Object-Aware Hardware Transactional Memory System
"... Transactional Memory (TM) is receiving attention as a way of expressing parallelism for programming multi-core systems. As a parallel programming model it is able to avoid the complexity of conventional locking. TM can enable multi-core hardware that dispenses with conventional bus-based cache coher ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Transactional Memory (TM) is receiving attention as a way of expressing parallelism for programming multi-core systems. As a parallel programming model it is able to avoid the complexity of conventional locking. TM can enable multi-core hardware that dispenses with conventional bus-based cache coherence, resulting in simpler and more extensible systems. This is increasingly important as we move into the many-core era. Within TM, however, the processes of conflict detection and committing still require synchronization and the broadcast of data. By increasing the granularity of when synchronization is required, the demands on communication are reduced. Software implementations of TM have taken advantage of the fact that the object structure of data can be employed to further raise the level at which interference is observed. The contribution of this paper is the first hardware TM approach where the object structure is recognized and harnessed. This leads to novel commit and conflict detection mechanisms, and also to an elegant solution to the virtualization of version management, without the need for additional software TM support. A first implementation of the proposed hardware TM system is simulated. The initial evaluation is conducted with three benchmarks derived from the STAMP suite and a transactional version of Lee’s routing algorithm. 1
Abstract Experiences using Adaptive Concurrency in Transactional Memory with Lee’s Routing Algorithm
"... Experience in profiling Lee’s routing algorithm, a new complex TM application, showed that transactional applications may exhibit dynamic exploitable parallelism, i.e. the amount of useful parallelism available at any point in time varies during the execution of the application. Obviously, executing ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Experience in profiling Lee’s routing algorithm, a new complex TM application, showed that transactional applications may exhibit dynamic exploitable parallelism, i.e. the amount of useful parallelism available at any point in time varies during the execution of the application. Obviously, executing too many transactions at times when the available parallelism is low will lead to high contention and wasted computation in aborted transactions, and vice versa. Current Transactional Memory (TM) implementations do not account for this behavior. This work employs adaptive concurrency to dynamically adjust the number of threads executing transactions concurrently. Our preliminary evaluation is performed in DSTM2 using Lee’s routing algorithm, both of which were simple to modify to enable adaptive concurrency, and shows significant reduction in resource usage, and modest performance gains.
Robust Adaptation to Available Parallelism in Transactional Memory Applications
"... Abstract. Applications using transactional memory may exhibit fluctuating (dynamic) available parallelism, i.e. the maximum number of transactions that can be committed concurrently may change over time. Executing large numbers of transactions concurrently in phases with low available parallelism wi ..."
Abstract
- Add to MetaCart
Abstract. Applications using transactional memory may exhibit fluctuating (dynamic) available parallelism, i.e. the maximum number of transactions that can be committed concurrently may change over time. Executing large numbers of transactions concurrently in phases with low available parallelism will waste processor resources in aborted transactions, while executing few transactions concurrently in phases with high available parallelism will degrade execution time by not fully exploiting the available parallelism. Three questions come to mind: (1) Are there such transactional applications? (2) How can such behaviour be exploited? and (3) How can available parallelism be measured or calculated efficiently? The contributions of this paper constitute the answers to these questions. This paper presents a system, called transactional concurrency tuning, that adapts the number of transactions executing concurrently in response to dynamic available parallelism, in order to improve processor resource usage and execution time performance. Four algorithms, called controller models, that vary in response strength were presented in previous work and shown to maintain execution time similar to the best case non-tuned execution time, but improve resource usage significantly in benchmarks that exhibit dynamic available parallelism. This paper presents an analysis of the four controller models ’ response characteristics to changes in dynamic available parallelism, and identifies weaknesses that reduce their general applicability. These limitations lead to the design of a fifth controller model, called P-only transactional concurrency tuning (PoCC). Evaluation of PoCC shows it improves upon performance and response characteristics of the first four controller models, making it a robust controller model suitable for general use. 1
Improving Performance by Reducing Aborts in Hardware Transactional Memory
"... Abstract. The optimistic nature of Transactional Memory (TM) systems can lead to the concurrent execution of transactions that are later found to conflict. Conflicts degrade scalability, and may lead to aborts that increase wasted work, and degrade performance. A promising approach to reducing confl ..."
Abstract
- Add to MetaCart
Abstract. The optimistic nature of Transactional Memory (TM) systems can lead to the concurrent execution of transactions that are later found to conflict. Conflicts degrade scalability, and may lead to aborts that increase wasted work, and degrade performance. A promising approach to reducing conflicts at runtime is dynamically, and transparently, reordering the execution of transactions upon discovery of conflicts. This approach has been explored in Software TMs (STMs), but not in Hardware TMs (HTMs). Furthermore, STM implementations of this approach cannot be ported to HTMs easily. This paper investigates the feasibility of such reordering in HTMs, and presents two designs that are scalable, independent of the on-chip interconnect, require only minor modifications to each core, and add no execution overhead if no conflicts occur. The evaluation takes LogTM-SE as a base line and considers benchmarks with different levels of contention (transactional conflicts). The results show that the preferred design increases HTM performance by up to 17 % when contention is low, 57 % when contention is high, and never degrades performance. Finally, the designs are orthogonal to LogTM-SE; they require no modification to cache structures, and continue to support transaction virtualization, open and closed unbounded nesting, paging, thread suspension, and thread migration. 1
• Figure 1: Lee's routing algorithm [2], for some datasets, showed
"... • Multicores add a new requirement to mainstream programming: parallel and scalable software. • Currently, such software is built using finegrain locks, but they are challenging to use in building robust and correct software. • Transactional memory is an alternative to finegrain locks that aims to b ..."
Abstract
- Add to MetaCart
• Multicores add a new requirement to mainstream programming: parallel and scalable software. • Currently, such software is built using finegrain locks, but they are challenging to use in building robust and correct software. • Transactional memory is an alternative to finegrain locks that aims to be easier, while maintaining performance. 2. Motivating Adaptive Concurrency Control
Preprint Series CSPP-44 On the Characterisation of Complex Transactional Memory Applications
, 2008
"... Transactional Memory (TM) has become an active research area as it promises to simplify the development of highly scalable parallel programs. Scalability is quickly becoming an essential software requirement as successive commodity processors integrate ever larger numbers of cores. However, complex ..."
Abstract
- Add to MetaCart
Transactional Memory (TM) has become an active research area as it promises to simplify the development of highly scalable parallel programs. Scalability is quickly becoming an essential software requirement as successive commodity processors integrate ever larger numbers of cores. However, complex TM applications to test TM implementations have only recently begun to emerge, and their execution characteristics have not been fully investigated. Complicating matters further, the complex TM applications have been written in different programming languages, using different TM implementations, making comparisons difficult. We have ported several complex TM applications to a single TM implementation, and built into it a framework to profile their execution. This paper presents performance figures and execution characteristics of major complex TM applications up to 8 processors, and for the first time, due to executing under a single TM implementation, presents directly comparable performance figures and execution characteristics. Also the priority contention manager is found to provide the best overall results for these applications, in contrast to previously published results that suggest the polka contention manageer gives the best overall results.
http://www.cs.otago.ac.nz/research/techreports.php View-Oriented Transactional Memory
"... Abstract—This paper proposes a View-Oriented Transactional Memory (VOTM) model to seamlessly integrate different concurrency control methods including locking mechanism and transactional memory. The model allows programmers to partition the shared memory into “views ” which are nonoverlapping sets o ..."
Abstract
- Add to MetaCart
Abstract—This paper proposes a View-Oriented Transactional Memory (VOTM) model to seamlessly integrate different concurrency control methods including locking mechanism and transactional memory. The model allows programmers to partition the shared memory into “views ” which are nonoverlapping sets of shared data objects. A Restricted Admission Control (RAC) scheme is proposed to control the number of processes accessing each view in order to reduce the number of aborts of transactions. The RAC scheme has the merits of both the locking mechanism and the transactional memory. Experimental results demonstrate that VOTM outperforms traditional transactional memory models such as TinySTM by up to five times. Also VOTM outperforms pure lockbased models in applications with long critical sections and has comparable performance with lock-based models in other cases. Keywords-View-Oriented Transactional Memory (VOTM), transactional memory, deadlock, concurrency control, Restricted
TrC-MC: Decentralized Software Transactional Memory for Multi-Multicore Computers
"... Abstract—To achieve single-lock atomicity in software transactional memory systems, the commit procedure often goes through a common clock variable. When there are frequent transactional commits, clock sharing becomes inefficient. Tremendous cache contention takes place between the processors and th ..."
Abstract
- Add to MetaCart
Abstract—To achieve single-lock atomicity in software transactional memory systems, the commit procedure often goes through a common clock variable. When there are frequent transactional commits, clock sharing becomes inefficient. Tremendous cache contention takes place between the processors and the computing throughput no longer scales with processor count. Therefore, traditional transactional memories are unable to accelerate applications with frequent commits regardless of thread count. While systems with decentralized data structures have better performance on these applications, we argue they are incomplete as they create much more aborts than traditional transactional systems. In this paper we apply two design changes, namely zone partitioning and timestamp extension, to optimize an existing decentralized algorithm. We prove the correctness and evaluate some benchmark programs with frequent transactional commits. We find it as much as several times faster than the state-of-theart software transactional memory system. We have also reduced the abort rate of the system to an acceptable level. I.

