Results 11 - 20
of
37
Master/Slave Speculative Parallelization and Approximate Code
, 2002
"... This dissertation describes Master/Slave Speculative Parallelization (MSSP), a novel execution paradigm to improve the execution rate of sequential programs by parallelizing them speculatively for execution on a multiprocessor. In MSSP, one processor—the master—executes an approximate copy of the pr ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
This dissertation describes Master/Slave Speculative Parallelization (MSSP), a novel execution paradigm to improve the execution rate of sequential programs by parallelizing them speculatively for execution on a multiprocessor. In MSSP, one processor—the master—executes an approximate copy of the program to compute values the program’s execution is expected to compute. The master’s results are then checked by the slave processors by comparing them to the results computed by the original program. This validation is parallelized by cutting the program’s execution into tasks. Each slave uses its predicted inputs (as computed by the master) to validate the input predictions of the next task, inductively validating the whole execution. Approximate code, because it has no correctness requirements—in essence it is a software value predictor—can be optimized more effectively than traditionally generated code. It is free to sacrifice correctness in the uncommon case in order to maximize performance in the common case. In addition to introducing the notion of approximate code, this dissertation describes a prototype implementation of a program distiller that uses profile information to automatically generate approximate code. The distiller first applies unsafe transformations to remove uncommon case behaviors that are preventing optimization;
Loop Selection for Thread-Level Speculation
- In Proceedings of the 18 th International Workshop on Languages and Compilers for Parallel Computing
, 2005
"... Abstract. Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. However, the high cost associated with unbalanced load, failed speculation, and inter-thread value communication mak ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Abstract. Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. However, the high cost associated with unbalanced load, failed speculation, and inter-thread value communication makes it difficult to obtain the desired performance unless the speculative threads are carefully chosen. In this paper, we focus on extracting parallel threads from loops in generalpurpose applications because loops, with their regular structures and significant coverage on execution time, are ideal candidates for extracting parallel threads. General-purpose applications, however, usually contain a large number of nested loops with unpredictable parallel performance and dynamic behavior, thus making it difficult to decide which set of loops should be parallelized to improve overall program performance. Our proposed loop selection algorithm addresses all these difficulties. We have found that (i) with the aid of profiling information, compiler analyses can achieve a reasonably accurate estimation of the performance of parallel execution, and that (ii) different invocations of a loop may behave differently, and exploiting this dynamic behavior can further improve performance. With a judicious choice of loops, we can improve the overall program performance of SPEC2000 integer benchmarks by as much as 20%. 1
Tolerating dependences between large speculative threads via sub-threads
- In ISCA ’06
, 2006
"... Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds to several thousand dynamic instructions and have minimal dependences between them. Recent work has ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds to several thousand dynamic instructions and have minimal dependences between them. Recent work has shown that TLS can offer compelling performance improvements for database workloads, but only when targeting much larger speculative threads of more than 50,000 dynamic instructions per thread, with many frequent data dependences between them. To support such large and dependent speculative threads, hardware must be able to buffer the additional speculative state, and must also address the more challenging problem of tolerating the resulting cross-thread data dependences. In this paper we present hardware support for large speculative threads that integrates several previous proposals for TLS hardware. We also introduce support for subthreads: a mechanism for tolerating cross-thread data dependences by checkpointing speculative execution. When speculation fails due to a violated data dependence, with sub-threads the failed thread need only rewind to the checkpoint of the appropriate sub-thread rather than rewinding to the start of execution; this significantly reduces the cost of mis-speculation. We evaluate our hardware support for large and dependent speculative threads in the database domain and find that the transaction response time for three of the five transactions from TPC-C (on a simulated 4-processor chip-multiprocessor) speedup by a factor of 1.9 to 2.9. 1.
Compiler Optimization of Value Communication for Thread-Level Speculation
, 2005
"... In the context of Thread-Level Speculation (TLS), inter-thread value communication is the key to e#cient parallel execution. From the compiler 's perspective, TLS supports two forms of inter-thread value communication: speculation and synchronization. Speculation allows for maximum parallel over ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In the context of Thread-Level Speculation (TLS), inter-thread value communication is the key to e#cient parallel execution. From the compiler 's perspective, TLS supports two forms of inter-thread value communication: speculation and synchronization. Speculation allows for maximum parallel overlap when it succeeds, but becomes costly when it fails.
Spice: Speculative Parallel Iteration Chunk Execution
"... ABSTRACT The recent trend in the processor industry of packing multiple pro-cessor cores in a chip has increased the importance of automatic techniques for extracting thread level parallelism. A promising ap-proach for extracting thread level parallelism in general purpose applications is to apply m ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
ABSTRACT The recent trend in the processor industry of packing multiple pro-cessor cores in a chip has increased the importance of automatic techniques for extracting thread level parallelism. A promising ap-proach for extracting thread level parallelism in general purpose applications is to apply memory alias or value speculation to breakdependences amongst threads and executes them concurrently.
A probabilistic pointer analysis for speculative optimizations
-
, 2006
"... Pointer analysis is a critical compiler analysis used to disambiguate the indirect memory ref-erences that result from the use of pointers and pointer-based data structures. A conventional pointer analysis deduces for every pair of pointers, at any program point, whether a points-to relation between ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Pointer analysis is a critical compiler analysis used to disambiguate the indirect memory ref-erences that result from the use of pointers and pointer-based data structures. A conventional pointer analysis deduces for every pair of pointers, at any program point, whether a points-to relation between them (i) definitely exists, (ii) definitely does not exist, or (iii) maybe exists. Many compiler optimizations rely on accurate pointer analysis, and to ensure correctness can-not optimize in the maybe case. In contrast, recently-proposed speculative optimizations can aggressively exploit the maybe case, especially if the likelihood that two pointers alias could be quantified. This dissertation proposes a Probabilistic Pointer Analysis (PPA) algorithm that statically predicts the probability of each points-to relation at every program point. Building on simple control-flow edge profiling, the analysis is both one-level context and flow sensitive—yet can still scale to large programs.
Safe Programmable Speculative Parallelism
"... Execution order constraints imposed by dependences can serialize computation, preventing parallelization of code and algorithms. Speculating on the value(s) carried by dependences is one way to break such critical dependences. Value speculation has been used effectively at a low level, by compilers ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Execution order constraints imposed by dependences can serialize computation, preventing parallelization of code and algorithms. Speculating on the value(s) carried by dependences is one way to break such critical dependences. Value speculation has been used effectively at a low level, by compilers and hardware. In this paper, we focus on the use of speculation by programmers as an algorithmic paradigm to parallelize seemingly sequential code. We propose two new language constructs, speculative composition and speculative iteration. These constructs enable programmers to declaratively express speculative parallelism in programs: to indicate when and how to speculate, increasing the parallelism in the program, without concerning themselves with mundane implementation details. We present a core language with speculation constructs and mutable state and present a formal operational semantics for the language. We use the semantics to define the notion of a correct speculative execution as one that is equivalent to a non-speculative execution. In general, speculation requires a runtime mechanism to undo the effects of speculative computation in the case of mispredictions. We describe a set of conditions under which such rollback can be avoided. We present a static analysis that checks if a given program satisfies these conditions. This allows us to implement speculation efficiently, without the overhead required for rollbacks. We have implemented the speculation constructs as a C # library, along with the static checker for safety. We present an empirical evaluation of the efficacy of this approach to parallelization.
Improving Cache Locality for Thread-Level Speculation Systems
- In IPDPS 20
, 2005
"... With the advent of chip-multiprocessors (CMPs), Thread-Level Speculation (TLS) remains a promising technique for exploiting this highly multithreaded hardware to improve the performance of an individual program. However, with such speculatively-parallel execution the cache locality once enjoyed by t ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
With the advent of chip-multiprocessors (CMPs), Thread-Level Speculation (TLS) remains a promising technique for exploiting this highly multithreaded hardware to improve the performance of an individual program. However, with such speculatively-parallel execution the cache locality once enjoyed by the original uniprocessor execution is significantly disrupted: for TLS execution on a four-processor CMP, we find that the data-cache miss rates are nearly four-times those of the uniprocessor case, even though TLS execution utilizes four private data caches.
SUDS: Automatic Parallelization for Raw Processors
, 2003
"... A computer can never be too fast or too cheap. Computer systems pervade nearly every aspect of science, engineering, communications and commerce because they perform certain tasks at rates unachievable by any other kind of system built by humans. A computer system 's throughput, however, is constrai ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
A computer can never be too fast or too cheap. Computer systems pervade nearly every aspect of science, engineering, communications and commerce because they perform certain tasks at rates unachievable by any other kind of system built by humans. A computer system 's throughput, however, is constrained by that system 's ability to find concurrency. Given a particular target work load the computer architect's role is to design mechanisms to find and exploit the available concurrency in that work load.
Compiler and Hardware Support for Reducing the Synchronization of Speculative Threads
"... Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article we focus on one important limitation of program performance under TLS, which is stalls due to synchronizi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article we focus on one important limitation of program performance under TLS, which is stalls due to synchronizing and forwarding scalar values between speculative threads that would otherwise cause frequent data dependences and hence failed speculation. Using SPECint benchmarks that have been automatically-transformed by our compiler to exploit TLS, we present, evaluate in detail, and compare both compiler and hardware techniques for improving the communication of scalar values. We find that through our dataflow algorithms for three increasingly-aggressive instruction scheduling techniques, the compiler can drastically reduce the critical forwarding path introduced by the synchronization and forwarding of scalar values. We also show that hardware techniques for reducing synchronization can be complementary to compiler scheduling, but that the additional performance benefits are minimal and are generally not worth the cost.

