Results 1 - 10
of
15
An Object-Oriented Concurrent Reflective Language ABCL/R3
, 2000
"... This article presents the design principles and efficient implementation techniques for ABCL/R3, an object-oriented concurrent reflective language. One of the most distinguished features of ABCL/R3 is compilation techniques using partial evaluation, which effectively remove interpretation from meta- ..."
Abstract
-
Cited by 56 (11 self)
- Add to MetaCart
This article presents the design principles and efficient implementation techniques for ABCL/R3, an object-oriented concurrent reflective language. One of the most distinguished features of ABCL/R3 is compilation techniques using partial evaluation, which effectively remove interpretation from meta-level programs. The meta-level objects are designed so that they can be partially evaluated in an effective manner. Benchmark programs show that our compilation frameworks make object execution drastically faster than interpreter-based implementations, and achieves performance close to nonreflective compilers.
Lazy Threads: Implementing a Fast Parallel Call
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1996
"... In this paper we describe lazy threads, a new approach for implementing multi-threaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficiency of a sequential call. The central idea is to specialize the representation of a parallel call so t ..."
Abstract
-
Cited by 50 (3 self)
- Add to MetaCart
In this paper we describe lazy threads, a new approach for implementing multi-threaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficiency of a sequential call. The central idea is to specialize the representation of a parallel call so that it can execute as a parallel-ready sequential call. This allows excess parallelism to degrade into sequential calls with the attendant efficient stack management and direct transfer of control and data, yet a call that truly needs to execute in parallel, gets its own thread of control. The efficiency of lazy threads is achieved through a careful attention to storage management and a code generation strategy that allows us to represent potential parallel work with no overhead.
A hybrid execution model for fine-grained languages on distributed memory multicomputers
- In Proceedings of Supercomputing'95
, 1995
"... While ne-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their exibility has generally resulted in poor execution e ciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitl ..."
Abstract
-
Cited by 30 (11 self)
- Add to MetaCart
While ne-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their exibility has generally resulted in poor execution e ciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose ahybrid execution model which dynamically adapts to runtime data layout, providing both sequential e ciency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential e ciency is obtained by dynamically coalescing threads via stack-based execution and parallel e ciency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential e ciency comparable to C programs. Experiments with regular and irregular application kernels on the CM-5
Runtime Mechanisms for Efficient Dynamic Multithreading
, 1996
"... High performance on distributed memory machines for programming models with dynamic thread creation and multithreading requires efficient thread management and communication. Traditional multithreading runtimes, consisting of few general-purpose, bundled mechanisms that assume minimal compiler and h ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
High performance on distributed memory machines for programming models with dynamic thread creation and multithreading requires efficient thread management and communication. Traditional multithreading runtimes, consisting of few general-purpose, bundled mechanisms that assume minimal compiler and hardware support, are suitable for computations involving coarse-grained threads but provide low efficiency in the presence of small granularity threads and irregular communication behavior. We describe two mechanisms of the Illinois Concert runtime system which address this shortcoming. The first, hybrid stack-heap execution, exploits close coupling with the compiler to dynamically form coarse-grained execution units; threads are lazily created as required by runtime situations. The second, pull messaging, exploits hardware support to implement a distributed message queue with receiverinitiated data transfer, delivering robust performance across a wide range of dynamic communication charact...
Fine-grain Multithreading with Minimal Compiler Support -- A Cost Effective Approach to Implementing Efficient Multithreading Languages
- PLDI'97
, 1997
"... It is difficult to map the execution model of multithread-ing languages (languages which support fine-grain dynamic thread creation) onto the single stack execution model of C. Consequently, previous work on efficient multithreading uses elaborate frame formats and allocation strategy, with com-pile ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
It is difficult to map the execution model of multithread-ing languages (languages which support fine-grain dynamic thread creation) onto the single stack execution model of C. Consequently, previous work on efficient multithreading uses elaborate frame formats and allocation strategy, with com-pilers customized for them. This paper presents an alterna-tive cost-effective implementation strategy for multithread-ing languages which can maximally exploit current sequen-tial C compilers. We identify a set of primitives whereby ef-ficient dynamic thread creation and switch can be achieved and clarify implementation issues and solutions which work under the stack frame layout and calling conventions of cur-rent C compilers. The primitives are implemented as a C library and named StackThreads. In StackThreads, a thread creation is done just by a C procedure call, max-imizing thread creation performance. When a procedure suspends an execution, the context of the procedure, which is roughly a stack frame of the procedure, is saved into heap and resumed later. With StackThreads, the compiler writer can straightforwardly translate sequential constructs of the source language into corresponding C statements or expres-sions, while using StackThreads primitives as a blackbox mechanism which switches execution between C procedures.
StackThreads/MP: Integrating Futures into Calling Standards
- PPOPP'99
, 1999
"... An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is described. Like previous similar systems, it performs an asynchronous call as if it were an ordinary procedure call, and ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is described. Like previous similar systems, it performs an asynchronous call as if it were an ordinary procedure call, and detaches the callee from the caller when the callee sus-pends or either of them migrates to another processor. Un-like previous similar systems, it detaches and connects arbi-trary frames generated by off-the-shelf sequential compilers obeying calling standards. As a consequence, it requires neither a frontend preprocessor nor a native code genera-tor that has a builtin notion of parallelism. The system practically works with unmodified GNU Ccompiler (GCC). Desirable extensions to sequential compilers for guarantee-ing portability and correctness of the scheme are clarified and claimed modest. Experiments indicate that sequential performance is not sacrificed for practical applications and both sequential and parallel performance are comparable to Cilk[B], whose current implementation requires a fairly so-phisticated preprocessor to C. These results show that ef-ficient asynchronous calls (a.k.a. future calls) can be inte-grated into current calling standard with a very small impact both on sequential performance and compiler engineering.
Schematic: A Concurrent Object-Oriented Extension to Scheme
- In Proceedings of Workshop on Object-Based Parallel and Distributed Computation, number 1107 in Lecture Notes in Computer Science
, 1996
"... A concurrent object-oriented extension to the programming language Scheme, called Schematic, is described. Schematic supports familiar constructs often used in typical parallel programs (future and higher-level macros such as plet and pbegin), which are actually defined atop a very small number of f ..."
Abstract
-
Cited by 18 (12 self)
- Add to MetaCart
A concurrent object-oriented extension to the programming language Scheme, called Schematic, is described. Schematic supports familiar constructs often used in typical parallel programs (future and higher-level macros such as plet and pbegin), which are actually defined atop a very small number of fundamental primitives. In this way, Schematic achieves both the convenience for typical concurrent programming and simplicity and flexibility of the language kernel. Schematic also supports concurrent objects which exhibit more natural and intuitive behavior than the "bare" (unprotected) shared memory, and permit intra-object concurrency. Schematic will be useful for intensive parallel applications on parallel machines or networks of workstations, concurrent graphical user interface programming, distributed programming over network, and even concurrent shell programming.
An Effective Garbage Collection Strategy for Parallel Programming Languages on Large Scale Distributed-Memory Machines
, 1997
"... This paper describes the design and implementation of a garbage collection scheme on large-scale distributed-memory computers and reports various experimental results. The collector is based on the conservative GC library by Boehm & Weiser. Each processor traces local pointers using the GC library w ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This paper describes the design and implementation of a garbage collection scheme on large-scale distributed-memory computers and reports various experimental results. The collector is based on the conservative GC library by Boehm & Weiser. Each processor traces local pointers using the GC library while traversing remote pointers by exchanging "mark messages" between processors. It exhibits a promising performance---in the most space-intensive settings we tested, the total collection overhead ranges from 5% up to 15% of the application running time (excluding idle time). We not only examine basic performance figures such as the total overhead or latency of a global collection, but also demonstrate how local collection scheduling strategies affect application performance. In our collector, a local collection is scheduled either independently or synchronously. Experimental results show that the benefit of independent local collections has been overstated in the literature. Independent l...
Supporting High Level Programming with High Performance: The Illinois Concert System
- In Proceedings of the Second International Workshop on High-level Parallel Programming Models and Supportive Environments
, 1997
"... Programmers of concurrent applications are faced with a complex performance space in which data distribution and concurrency management exacerbate the difficulty of building large, complex applications. To address these challenges, the Illinois Concert system provides a global namespace, implicit co ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Programmers of concurrent applications are faced with a complex performance space in which data distribution and concurrency management exacerbate the difficulty of building large, complex applications. To address these challenges, the Illinois Concert system provides a global namespace, implicit concurrency control and granularity management, implicit storage management, and object-oriented programming features. These features are embodied in a language ICC++ (derived from C++) which has been used to build a number of kernels and applications. As high level features can potentially incur overhead, the Concert system employs a range of compiler and runtime optimization techniques to efficiently support the high level programming model. The compiler techniques include type inference, inlining and specialization; and the runtime techniques include caching, prefetching and hybrid stack/heap multithreading. The effectiveness of these techniques permits the construction of complex parallel ...
An Efficient Compilation Framework for Languages Based on a Concurrent Process Calculus
, 1997
"... We propose a framework for compiling programming languages based on concurrent process calculi, in which computation is expressed by a combination of processes and communication channels. Our framework realizes a compile-time process scheduling and unboxed channels. The compile-time scheduling enabl ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
We propose a framework for compiling programming languages based on concurrent process calculi, in which computation is expressed by a combination of processes and communication channels. Our framework realizes a compile-time process scheduling and unboxed channels. The compile-time scheduling enables us to execute multiple independent processes without ascheduling pool operation. Unboxed channels allow us to create a channel without memory allocations and to communicate values on registers. The framework is given as a set of translation rules from a concurrent calculus to an ML-like sequential program. Experimental results show that our compiler can execute sequential programs written in the process calculus only a few times slower than equivalent C programs. This indicates that pure process calculi like ours and programming languages based on them can be implemented efficiently, without losing their simplicity, purity, and elegance.

