Results 1 -
7 of
7
EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY
"... We propose a scheme within which parallel programs with potential synchronization bottlenecks run efficiently. In the straightforward implementations which use basic locking schemes, the execution time for the program parts with bottlenecks increases significantly when the number of processors incre ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We propose a scheme within which parallel programs with potential synchronization bottlenecks run efficiently. In the straightforward implementations which use basic locking schemes, the execution time for the program parts with bottlenecks increases significantly when the number of processors increases. Our scheme makes the parallel performance for the bottleneck parts of programs close to the sequential performance while maintaining the e ciency with which the nonbottleneck parts run. Experiments with a 64-processor SMP and a 128-processor DSM machine confirmed that parallel programs implemented with our scheme perform much better than parallel programs implemented with other widely-used locking schemes.
An Implementation and Performance Evaluation of Language with Fine-Grain Thread Creation on Shared MemoryParallel Computer
, 1998
"... We implemented two applications with irregular parallelism in (1) C and a thread libraryand (2) our concurrent language Schematic which supports e cient ne-grain dynamic thread creation and its dynamic load balance. We compared the two approaches focusing on program description cost and performance. ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
We implemented two applications with irregular parallelism in (1) C and a thread libraryand (2) our concurrent language Schematic which supports e cient ne-grain dynamic thread creation and its dynamic load balance. We compared the two approaches focusing on program description cost and performance. Schematic not only achieves common programming practices seen in C such as task queue management with much smaller description cost, but incorporates some advanced optimizations for synchronization such as inter-thread communication on register. The case studyshows that Schematic can describe irregular applications more naturallyand can achieve high performance: Schematic is executed about 2.8 times slower than C on sequential environment and its speedup on 64 processor environment is comparable to C.
Online Computation of Critical Paths for Multithreaded Languages
- In Proceedings of the 5th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2000), volume 1800 of Lecture Notes in Computer Science
, 2000
"... . We have developed an instrumentation scheme that enables programs written in multithreaded languages to compute a critical path at runtime. Our scheme gives not only the length (execution time) of the critical path but also the lengths and locations of all the subpaths making up the critical p ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
. We have developed an instrumentation scheme that enables programs written in multithreaded languages to compute a critical path at runtime. Our scheme gives not only the length (execution time) of the critical path but also the lengths and locations of all the subpaths making up the critical path. Although the scheme is like Cilk's algorithm in that it uses a "longest path" computation, it allows more flexible synchronization. We implemented our scheme on top of the concurrent object-oriented language Schematic and confirmed its effectiveness through experiments on a 64-processor symmetric multiprocessor. 1 Introduction The scalability expected in parallel programming is often not obtained in the first run, and then performance tuning is necessary. In the early stages of this tuning it is very useful to know what the critical path and how long it is. The length of an execution path is defined as the amount of time needed to execute it, and the critical path is the longest o...
Reasoning-conscious Meta-object Design of a Reflective Concurrent Language
, 1997
"... Computational reflection gives programming languages high flexibility, which is useful for parallel/distributed programming. On the other hand, its interpreter based execution model makes efficient implementation difficult. Especially, meta-objects in concurrent languages are described with explicit ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Computational reflection gives programming languages high flexibility, which is useful for parallel/distributed programming. On the other hand, its interpreter based execution model makes efficient implementation difficult. Especially, meta-objects in concurrent languages are described with explicit state transition, which makes program reasoning---such as partial evaluation---difficult. In this paper, we propose a new meta-object design, which exploits reader/writer methods in our concurrent object-oriented language Schematic. The crux of the design is separation of state-related operations from others, which allows us to optimize meta-objects using an existing partial evaluator because the most methods in the meta-objects can be regarded as a sequential program. 1 Introduction 1.1 Reflection in Parallel/Distributed Programs Practical parallel and distributed programs often have complicated computation and communication structures for achieving efficiency, fault-tolerance, portabili...
Achieving High Performance for Parallel Programs that Contain Unscalable Modules
, 2000
"... This thesis is a description of a compiler and runtime technique for the efficient management of threads including their mutual exclusion. The target area for this work is parallel languages for shared-memory multiprocessors. The goal of this work is to achieve a situation in which the execution tim ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This thesis is a description of a compiler and runtime technique for the efficient management of threads including their mutual exclusion. The target area for this work is parallel languages for shared-memory multiprocessors. The goal of this work is to achieve a situation in which the execution time either decreases or remains unchanged as the number of processors is increased. We call this performance model the satisfactory performance model. Existing parallel programming systems do not always perform according to this satisfactory model. This is the case when there are modules in the program such that concurrent invocations of the modules are serialized. We call these modules bottleneck modules. When bottleneck modules are present they prevent operation according to the satisfactory performance model since the overhead incurred because of bottleneck modules increases with the number of processors. This overhead includes communications with memory for the sharing of memory objects am...
Fusion of Concurrent Invocations
- Transactions of Information Processing Society of Japan, 42(SIG 2 (PRO 9)):13-- 25, February 2001. In Japanese
, 2001
"... This paper describes a mechanism for "fusing" concurrent invocations of exclusive methods. The target of our work is object-oriented languages with concurrent extensions. In the languages, concurrent invocations of exclusive methods are serialized; only one invocation executes immediately and th ..."
Abstract
- Add to MetaCart
This paper describes a mechanism for "fusing" concurrent invocations of exclusive methods. The target of our work is object-oriented languages with concurrent extensions. In the languages, concurrent invocations of exclusive methods are serialized; only one invocation executes immediately and the others wait for their turn. The mechanism fuses multiple waiting invocations to a cheaper operation such as a single invocation.

