Results 1 -
3 of
3
Time-shifted modules: Exploiting code modularity for fine grain parallelism
, 2001
"... Multi-threaded processors and chip-multiprocessors execute concurrent threads in close physical proximity, potentially reducing the cost of synchronization and communication significantly and enabling the parallelization of programs at a fine grain. In this paper, we explore a source of fine-grain p ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Multi-threaded processors and chip-multiprocessors execute concurrent threads in close physical proximity, potentially reducing the cost of synchronization and communication significantly and enabling the parallelization of programs at a fine grain. In this paper, we explore a source of fine-grain parallelism present in programs due to their modular nature. Concurrency is derived from executing code within a module in parallel with the main program. Because this technique exploits the modularity of code, rather than its regularity, it is applicable to irregular, integer applications. Furthermore, because all of the synchronization is encapsulated within the module, the process of parallelization is simplified—a programmer need only consider the module’s code—and, once created, libraries of such modules can be used to create parallel programs by programmers without having to reason about race conditions. We demonstrate the technique in two case studies, achieving speedups of 26 % and 39 % over the single-threaded base case on a simulated SMT processor. 1
Cell GC: using the cell synergistic processor as a garbage collection coprocessor
- In VEE ’08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
, 2008
"... In recent years, scaling of single-core superscalar processor performance has slowed due to complexity and power considerations. To improve program performance, designs are increasingly adopting chip multiprocessing with homogeneous or heterogeneous CMPs. By trading off features from a modern aggres ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In recent years, scaling of single-core superscalar processor performance has slowed due to complexity and power considerations. To improve program performance, designs are increasingly adopting chip multiprocessing with homogeneous or heterogeneous CMPs. By trading off features from a modern aggressive superscalar core, CMPs often offer better scaling characteristics in terms of aggregate performance, complexity and power, but often require additional software investment to rewrite, retune or recompile programs to take advantage of the new designs. The Cell Broadband Engine is a modern example of a heterogeneous CMP with coprocessors (accelerators) which can be found in supercomputers (Roadrunner), blade servers (IBM QS20/21), and video game consoles (SCEI PS3). A Cell BE processor has a host Power RISC processor (PPE) and eight Synergistic Processor Elements (SPE), each consisting of a Synergistic

