Results 1 -
9 of
9
Profiling and Optimizing Transactional Memory Applications
"... Many researchers have developed applications using transactional memory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and optimizing programs w ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Many researchers have developed applications using transactional memory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and optimizing programs which use transactions. In this paper we introduce a series of profiling and optimization techniques for TM applications. The profiling techniques are of three types: (i) techniques to identify multiple potential conflicts from a single program run, (ii) techniques to identify the data structures involved in conflicts by using a symbolic path through the heap, rather than a machine address, and (iii) visualization techniques to summarize how threads spend their time and which of their transactions conflict most frequently. Altogether they provide in-depth and comprehensive information about the wasted work caused by aborting transactions. To reduce the contention between transactions we suggest several TM specific optimizations which leverage nested transactions, transaction checkpoints, early release and etc. To examine the effectiveness of the profiling and optimization techniques, we provide a series of illustrations from the STAMP TM benchmark suite and from the synthetic WormBench workload. First we analyze the performance of TM applications using our profiling techniques and then we apply various optimizations to improve the performance of the Bayes, Labyrinth and Intruder applications. We discuss the design and implementation of the profiling techniques in the Bartok-STM system. We process data offline or during garbage collection, where possible, in order to minimize the probe effect introduced by profiling. 1 1
BulkSMT: Designing SMT Processors for Atomic-Block Execution ∗
"... Multiprocessor architectures that continuously execute atomic blocks (or chunks) of instructions can improve performance and software productivity. However, all of the prior proposals for such architectures assume single-context cores as building blocks — rather than the widely-used Simultaneous Mul ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Multiprocessor architectures that continuously execute atomic blocks (or chunks) of instructions can improve performance and software productivity. However, all of the prior proposals for such architectures assume single-context cores as building blocks — rather than the widely-used Simultaneous Multithreading (SMT) cores. As a result, they are underutilizing hardware resources. This paper presents the first SMT design that supports continuous chunked (or transactional) execution of its contexts. Our design, called BulkSMT, can be used either in a single-core processor or in a multicore of SMTs. We present a set of BulkSMT configurations with different cost and performance. We also describe the architectural primitives that enable chunked execution in an SMT core and in a multicore of SMTs. Our results, based on simulations of SPLASH-2 and PARSEC codes, show that BulkSMT supports chunked execution cost-effectively. In a 4-core multicore with eager chunked execution, BulkSMT reduces the execution time of the applications by an average of 26 % compared to running on singlecontext cores. In a single core, the average reduction is 32%. 1.
SI-TM: reducing transactional memory abort rates through snapshot isolation,” in ASPLOS, 2014. Viktor Leis is a PhD student in the database group at TUM. His research revolves around the HyPer main-memory database system. In particular, he focuses on opti
"... Abstract Transactional memory represents an attractive conceptual model for programming concurrent applications. Unfortunately, high transaction abort rates can cause significant performance degradation. Conventional transactional memory realizations not only pessimistically abort transactions on e ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract Transactional memory represents an attractive conceptual model for programming concurrent applications. Unfortunately, high transaction abort rates can cause significant performance degradation. Conventional transactional memory realizations not only pessimistically abort transactions on every read-write conflict but also because of false sharing, cache evictions, TLB misses, page faults and interrupts. Consequently, the use of transactions needs to be restricted to a very small number of operations to achieve predictable performance, thereby, limiting its benefit to programming simplification. In this paper, we investigate snapshot isolation transactional memory in which transactions operate on memory snapshots that always guarantee consistent reads. By exploiting snapshots, an established database model of transactions, transactions can ignore read-write conflicts and only need to abort on write-write conflicts. Our implementation utilizes a memory controller that supports multiversion memory, to efficiently support snapshotting in hardware. We show that snapshot isolation can reduce the number of aborts in some cases by three orders of magnitude and improve performance by up to 20x.
Conflict Reduction in Hardware Transactions Using Advisory Locks
- In Proc. of the 27th ACM Symp. on Parallelism in Algorithms and Architectures
, 2015
"... ABSTRACT Preliminary experience with hardware transactional memory suggests that aborts due to data conflicts are one of the principal obstacles to scale-up. To reduce the incidence of conflict, we propose an automatic, high-level mechanism that uses advisory locks to serialize (just) the portions ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Preliminary experience with hardware transactional memory suggests that aborts due to data conflicts are one of the principal obstacles to scale-up. To reduce the incidence of conflict, we propose an automatic, high-level mechanism that uses advisory locks to serialize (just) the portions of the transactions in which conflicting accesses occur. We demonstrate the feasibility of this mechanism, which we refer to as staggered transactions, with fully developed compiler and runtime support, running on simulated hardware. Our compiler identifies and instruments a small subset of the accesses in each transaction, which it determines, statically, are likely to constitute initial accesses to shared locations. At run time, the instrumentation acquires an advisory lock on the accessed datum, if (and only if) prior execution history suggests that the datum-or locations "downstream" of it-are indeed a likely source of conflict. Policy to drive the decision requires one hardware feature not generally found in current commercial offerings: nontransactional loads and stores within transactions. It can also benefit from a mechanism to record the program counter at which a cache line was first accessed in a transaction. Simulation results show that staggered transactions can significantly reduce the frequency of conflict aborts and increase program performance.
A Hardware/Software Approach for Alleviating Scalability Bottlenecks in Transactional Memory Applications
, 2011
"... ii ACKNOWLEDGEMENTS There are almost too many people to thank that have helped me survive my trials through the PhD program here at the University of Michigan. I would first like to thank my advisor Professor Trevor Mudge who took a chance tak-ing me on as a student in the Fall of 2005 and providing ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
ii ACKNOWLEDGEMENTS There are almost too many people to thank that have helped me survive my trials through the PhD program here at the University of Michigan. I would first like to thank my advisor Professor Trevor Mudge who took a chance tak-ing me on as a student in the Fall of 2005 and providing funding when I was still unsure if I could survive in the program. His hands off advising style was a perfect fit that al-lowed me to pursue what I found interesting, but still hands on enough to help me ask the right questions about what I was doing to continue making progress toward eventually graduating. Thanks to Taeho Kgil, a senior graduate student I used as my surrogate advisor when Trevor was on sabbatical my first year. His patience in answering my endless stream of questions about all things regarding research and the state-of-the-art in computer architec-ture was invaluable in pointing me towards finding my niche. I would like to thank Ronald Dreslinski for spending numerous hours editing papers with me.
HARP: Adaptive Abort Recurrence Prediction for Hardware Transactional Memory
"... Abstract-Hardware Transactional Memory (HTM) exposes parallelism by allowing possibly conflicting sections of code, called transactions, to execute concurrently in multithreaded applications. However, conflicts among concurrent transactions result in wasted computation and expensive rollbacks. Unde ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-Hardware Transactional Memory (HTM) exposes parallelism by allowing possibly conflicting sections of code, called transactions, to execute concurrently in multithreaded applications. However, conflicts among concurrent transactions result in wasted computation and expensive rollbacks. Under high contention HTM protocol overheads can, in many cases, amount to several times the useful work done. Blindly scheduling transactions in the presence of contention is therefore clearly suboptimal from a resource utilization standpoint, especially in situations where several scheduling options exist. This paper presents HARP (Hardware Abort Recurrence Predictor), a hardware-only mechanism to avoid speculation when it is likely to fail. Inspired by branch prediction strategies and prior work on contention management and scheduling in HTM, HARP uses past behavior of transactions and locality in conflicting memory references to accurately predict conflicts. The prediction mechanism adapts to variations in workload characteristics and enables better utilization of computational resources. We show that an HTM protocol that integrates HARP exhibits reductions in both wasted execution time and serialization overheads when compared to prior work, leading to a significant increase in throughput (~30%) in both singleapplication and multi-application scenarios.
Supervised by
"... Multithreaded programming, which well fits the structure of modern shared memory systems, is becoming one of the most popular parallel programming models. Two important supporting components of multithreaded programming, concurrent data structures and transactional memory, can benefit from speculati ..."
Abstract
- Add to MetaCart
(Show Context)
Multithreaded programming, which well fits the structure of modern shared memory systems, is becoming one of the most popular parallel programming models. Two important supporting components of multithreaded programming, concurrent data structures and transactional memory, can benefit from speculation. However, traditional speculation mechanisms, which do not exploit high level knowledge about the program, are inefficient in concurrent data structures and transactional memory systems. In this proposal, we bridge the gap between the high-level program knowledge and the speculation in multithreaded programming with the compiler’s assistance. We propose a language extension to incorporate fast speculation into concurrent data structure design, and show how the language extension helps the speculation in newly emerged HTM processors. Besides, in order to improve the speculation in transaction memory systems under the circumstance with data high contention, two compiler optimization techniques are also proposed for STM and HTM, respectively. iii