Results 1 -
4 of
4
Design space exploration of a software speculative parallelization scheme
- IEEE Transactions on Parallel and Distributed Systems
, 2005
"... Abstract—With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory system. Software schemes require no changes to the hardwar ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Abstract—With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory system. Software schemes require no changes to the hardware of existing shared-memory systems, but can suffer from significant overheads involved with the speculative execution. In fact, the performance of software schemes is highly dependent on application characteristics, the design and implementation of the scheme, and the system configuration and size. This paper explores the design space of a recently proposed software speculative parallelization scheme. In the process, we gain insight into the most beneficial features of software schemes for speculative parallelization, as well as the most influential application characteristics. For instance, experimental results show that, contrary to intuition, checking for data dependence violations on every speculative store, as opposed to at commit time, leads to little performance degradation in the worst case and to significantly better performance with large configurations. Also, scheduling policies based on windows can perform very close to fully dynamic policies with a fraction of the memory overhead. Finally, experimental results show consistent speedups in the execution of loops that cannot be parallelized at compile time, both with and without RAW data dependences, for 4 to 32 processors. Index Terms—Speculative parallelization, thread-level speculation, parallel architectures. 1
Meseta: A New Scheduling Strategy for Speculative Parallelization of Randomized Incremental Algorithms
- Proc. 34th Int’l Conf. Parallel Processing (ICPP ’05) Workshops, Seventh Workshop High Performance Scientific and Eng. Computing (HPSEC ’05
, 2005
"... incremental algorithms ..."
New scheduling strategies for randomized incremental algorithms in the context of speculative parallelization
- IEEE Transactions on Computers
"... Abstract—In this work, we address the problem of scheduling loops with dependences in the context of speculative parallelization. We show that the scheduling alternatives are highly influenced by the dependence violation pattern the code presents. We center our analysis in those algorithms where dep ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—In this work, we address the problem of scheduling loops with dependences in the context of speculative parallelization. We show that the scheduling alternatives are highly influenced by the dependence violation pattern the code presents. We center our analysis in those algorithms where dependences are less likely to appear as the execution proceeds. Particularly, we focus on randomized incremental algorithms, widely used as a much more efficient solution to many problems than their deterministic counterparts. These important algorithms are, in general, hard to parallelize by hand and represent a challenge for any automatic parallelization scheme. Our analysis led us to the development of MESETA, a new scheduling strategy that takes into account the probability of a dependence violation to determine the number of iterations being scheduled. MESETA is compared with existing techniques, including Fixed-Size Chunking (FSC), the only scheduling alternative used so far in the context of speculative parallelization. Our experimental results show a 5.5 percent to 36.25 percent speedup improvement over FSC, leading to a better extraction of the parallelism inherent to randomized incremental algorithms. Moreover, when the cost of dependence violations is too high to obtain speedups, MESETA curves the performance degradation. Index Terms—Parallelism and concurrency, load balancing and task assignment, scheduling and task partitioning, geometrical problems and computations. 1
Thread-level Speculative Parallelization
"... The basic idea under speculative parallelization (also called thread-level speculation) [2, 6, 7] is to assign the execution of different blocks of consecutive iterations to different threads, running each one on its own processor. While execution proceeds, a software monitor ensures that no thread ..."
Abstract
- Add to MetaCart
The basic idea under speculative parallelization (also called thread-level speculation) [2, 6, 7] is to assign the execution of different blocks of consecutive iterations to different threads, running each one on its own processor. While execution proceeds, a software monitor ensures that no thread consumes an incorrect version of a value that should be calculated by a predecessor, therefore violating sequential semantics. If such a dependence violation occur, the monitor stops the parallel execution of the offending threads, discards iterations incorrectly calculated, and restart their execution using the correct values. Figure 1 shows an example of speculative parallel execution of a loop with dependences. The detection of dependence violations can be done either by hardware or software. Hardware solutions [4, 5] rely on additional hardware modules to detect dependences, while software methods [2, 6, 7] augment the original loop with new instructions that check for violations during the parallel execution. The author’s visits to EPCC thanks to the TRACS and HPC-Europa programmes led to a successful collaboration with Dr. Marcelo Cintra, of the Division of Informatics, in the field of speculative parallelization. We have developed a new software-only speculative parallelization engine to automatically execute in parallel sequential loops with few or no dependences among iterations [1, 2, 3]. The main advantage of this solution is that it makes possible to parallelize an iterative application automatically by a compiler, thus obtaining speedups in

