Results 1 -
5 of
5
Compiler Optimization of Scalar Value Communication Between Speculative Threads
- In Proceedings of the 10th ASPLOS
, 2002
"... While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of pro ..."
Abstract
-
Cited by 56 (17 self)
- Add to MetaCart
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2--28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.
Improving value communication for thread-level speculation
- In Proceedings of the 8th HPCA
, 2002
"... Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this paper, we show that the key to good performance lies in the three different ways to communicate a value between s ..."
Abstract
-
Cited by 53 (10 self)
- Add to MetaCart
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this paper, we show that the key to good performance lies in the three different ways to communicate a value between speculative threads: speculation, synchronization, and prediction. The difficult part is deciding how and when to apply each method. This paper shows how we can apply value prediction, dynamic synchronization, and hardware instruction prioritization to improve value communication and hence performance in several SPECint benchmarks that have been automatically-transformed by our compiler to exploit TLS. We find that value prediction can be effective when properly throttled to avoid the high costs of misprediction, while most of the gains of value prediction can be more easily achieved by exploiting silent stores. We also show that dynamic synchronization is quite effective for most benchmarks, while hardware instruction prioritization is not. Overall, we find that these techniques have great potential for improving the performance of TLS. 1
The Jrpm System for Dynamically Parallelizing Java Programs
- In Proceedings of the 30th International Symposium on Computer Architecture
, 2003
"... We describe the Java runtime parallelizing machine (Jrpm), a complete system for parallelizing sequential programs automatically. Jrpm is based on a chip multiprocessor (CMP) with thread-level speculation (TLS) support. CMPs have low sharing and communication costs relative to traditional multt)roce ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
We describe the Java runtime parallelizing machine (Jrpm), a complete system for parallelizing sequential programs automatically. Jrpm is based on a chip multiprocessor (CMP) with thread-level speculation (TLS) support. CMPs have low sharing and communication costs relative to traditional multt)rocessors, and thread-level speculation (TLS) simplifies program parallelization by allowing us to parallelize optimistically without violating correct sequential program behavior. Using a Java virtual machine with dynamic compilation support coupled with a hardware profiler, speculative buffer requirements and inter-thread dependencies of prospective speculative thread loops (STLs) are analyzed in real-time to identi the best loops to parallelize. Once sufficient data has been collected to make a reasonable decision, selected loops are dynamically recompiled to run in parallel Experimental results demonstrate that Jrpm can exploit thread-level parallelism with minimal effort from the programmer. On four processors, we achieved speedups of 3 to 4 for floating point applications, to 3 on multimedia applications, and between 1.5 and .5 on integer applications. Performance was achieved by automatic selection of thread decompositions by the hardware profiler, intra-procedural optimizations on code compiled dynamically into speculative threads, and some minor programmer transformations for exposing parallelism that cannot be performed automatically.
Compiler Optimization of Value Communication for Thread-Level Speculation
, 2005
"... In the context of Thread-Level Speculation (TLS), inter-thread value communication is the key to e#cient parallel execution. From the compiler 's perspective, TLS supports two forms of inter-thread value communication: speculation and synchronization. Speculation allows for maximum parallel over ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In the context of Thread-Level Speculation (TLS), inter-thread value communication is the key to e#cient parallel execution. From the compiler 's perspective, TLS supports two forms of inter-thread value communication: speculation and synchronization. Speculation allows for maximum parallel overlap when it succeeds, but becomes costly when it fails.
Hardware Support for Thread-Level Speculation
, 2003
"... Novel architectures that support multithreading, for example chip multiprocessors, have become increasingly commonplace over the past decade: examples include the Sun MAJC, IBM Power4, Alpha 21464, and Intel Xeon, HP PA-8800. However, only workloads composed of independent threads can take advantage ..."
Abstract
- Add to MetaCart
Novel architectures that support multithreading, for example chip multiprocessors, have become increasingly commonplace over the past decade: examples include the Sun MAJC, IBM Power4, Alpha 21464, and Intel Xeon, HP PA-8800. However, only workloads composed of independent threads can take advantage of these processors---to improve the performance of a single application, that application must be transformed into a parallel version. Unfortunately the process of parallelization is extremely difficult: the compiler must prove that potential threads are independent, which is not possible for many general-purpose programs (e.g., spreadsheets, web software, graphics codes, etc.) due to their abundant use of pointers, complex control flow, and complex data structures. This dissertation investigates hardware support for Thread-Level Speculation (TLS), a technique which empowers the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent.

