Results 1 - 10
of
13
Concert -- Efficient Runtime Support for Concurrent Object-Oriented Programming Languages on Stock Hardware
, 1993
"... Inefficient implementations of global namespaces, message passing, and thread scheduling on stock multicomputers have prevented concurrent object-oriented programming (COOP) languages from gaining widespread acceptance. Recognizing that the architectures of stock multicomputers impose a hierarchy of ..."
Abstract
-
Cited by 58 (11 self)
- Add to MetaCart
Inefficient implementations of global namespaces, message passing, and thread scheduling on stock multicomputers have prevented concurrent object-oriented programming (COOP) languages from gaining widespread acceptance. Recognizing that the architectures of stock multicomputers impose a hierarchy of costs for these operations, we have described a runtime system which provides different versions of each primitive, exposing performance distinctions for optimization. We confirm the advantages of a cost-hierarchy based runtime system organization by showing a variation of two orders of magnitude in version costs for a CM5 implementation. Frequency measurements based on COOP application programs demonstrate that a 39 % invocation cost reduction is feasible by simply selecting cheaper versions of runtime operations.
Obtaining Sequential Efficiency for Concurrent Object-Oriented Languages
- In Proceedings of the ACM Symposium on the Principles of Programming Languages
, 1995
"... Concurrent object-oriented programming (COOP) languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control. In particular, pure fine-grained concurrent object-oriented languages (as opposed to hybrid or data parallel) provides the programmer wi ..."
Abstract
-
Cited by 47 (15 self)
- Add to MetaCart
Concurrent object-oriented programming (COOP) languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control. In particular, pure fine-grained concurrent object-oriented languages (as opposed to hybrid or data parallel) provides the programmer with a simple, uniform, and flexible model while exposing maximum concurrency. While such languages promise to greatly reduce the complexity of large-scale concurrent programming, the popularity of these languages has been hampered by efficiency which is often many orders of magnitude less than that of comparable sequential code. We present a sufficient set of techniques which enables the efficiency of fine-grained concurrent object-oriented languages to equal that of traditional sequential languages (like C) when the required data is available. These techniques are empirically validated by the application to a COOP implementation of the Livermore Loops. 1 Introduction The increasing use of ...
A hybrid execution model for fine-grained languages on distributed memory multicomputers
- In Proceedings of Supercomputing'95
, 1995
"... While ne-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their exibility has generally resulted in poor execution e ciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitl ..."
Abstract
-
Cited by 30 (11 self)
- Add to MetaCart
While ne-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their exibility has generally resulted in poor execution e ciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose ahybrid execution model which dynamically adapts to runtime data layout, providing both sequential e ciency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential e ciency is obtained by dynamically coalescing threads via stack-based execution and parallel e ciency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential e ciency comparable to C programs. Experiments with regular and irregular application kernels on the CM-5
Evaluation of Mechanisms for Fine-Grained Parallel Programs in the J-Machine and the CM-5
, 1993
"... This paper uses an abstract machine approach to compare the mechanisms of two parallel machines: the J-Machine and the CM-5. High-level parallel programs are translated by a single optimizing compiler to a finegrained abstract parallel machine, TAM. A final compilation step is unique to each machine ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper uses an abstract machine approach to compare the mechanisms of two parallel machines: the J-Machine and the CM-5. High-level parallel programs are translated by a single optimizing compiler to a finegrained abstract parallel machine, TAM. A final compilation step is unique to each machine and optimizes for specifics of the architecture. By determining the cost of the primitives and weighting them by their dynamic frequency in parallel programs, we quantify the effectiveness of the followingmechanisms individuallyand in combination. Efficient processor/network coupling proves valuable. Message dispatch is found to be less valuable without atomic operations that allow the scheduling levels to cooperate. Multiple hardware contexts are of small value when the contexts cooperate and the compiler can partition the register set. Tagged memory provides little gain. Finally, the performance of the overall system is strongly influenced by the performance of the memory system and the f...
Language Features for Re-use and Extensibility in Concurrent Object-Oriented Programming Languages
, 1993
"... ..."
A Message-driven Programming System for Fine-grain Multicomputers
, 1994
"... VIEW OF THE J-MACHINE This section provides an abstract description of the J-machine hardware. The machine provides three distinct hardware mechanisms to support concurrent programming: message queueing and dispatch, a hardware associative cache, and support for synchronization. Figure 1 provides a ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
VIEW OF THE J-MACHINE This section provides an abstract description of the J-machine hardware. The machine provides three distinct hardware mechanisms to support concurrent programming: message queueing and dispatch, a hardware associative cache, and support for synchronization. Figure 1 provides an abstract view of the hardware that highlights these features.
The Named-State Register File
- AI-TR 1459, MIT Artificial Intelligence Laboratory
, 1993
"... A register file is a critical resource of modern processors. Most hardware and software mechanisms to manage registers across procedure calls do not efficiently support multithreaded programs. To switch between parallel threads, a conventional processor must spill and reload thread contexts from reg ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A register file is a critical resource of modern processors. Most hardware and software mechanisms to manage registers across procedure calls do not efficiently support multithreaded programs. To switch between parallel threads, a conventional processor must spill and reload thread contexts from registers to memory. If context switches are frequent and unpredictable, a large fraction of execution time is spent saving and restoring registers. This thesis introduces the Named-State Register File, a fine-grain, fully-associative register organization. The NSF uses hardware and software mechanisms to manage registers among many concurrent activations. The NSF enables both fast context switching and efficient sequential program performance. The NSF holds more live data than conventional register files, and requires much less spill and reload traffic to switch between concurrent active contexts. The NSF speeds execution of some sequential and parallel programs by 9% to 17% over alternative r...
Experiments with Dataflow on a General-Purpose Parallel Computer
- Computer. MIT Artificial Intelligence Laboratory Technical Memo
, 1991
"... : The MIT J-Machine [2], a massively-parallel computer, is an experiment in providing general-purpose mechanisms for communication, synchronization, and naming that will support a wide variety of parallel models of comptuation. We have developed two experimental dataflow programming systems for the ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
: The MIT J-Machine [2], a massively-parallel computer, is an experiment in providing general-purpose mechanisms for communication, synchronization, and naming that will support a wide variety of parallel models of comptuation. We have developed two experimental dataflow programming systems for the J-Machine. For the first system, we adapted Papadopoulos' explicit token store [12] to implement static and then dynamic dataflow. Our second system made use of Iannucci's hybrid execution model [10] to combine several dataflow graph nodes into a single sequence, decreasing scheduling overhead. By combining the strengths of the two systems, it is possible to produce a system with competitive performance. We have demonstrated the feasibility of efficiently executing dataflow programs on a general-purpose parallel computer. Keywords: compilation, parallelization, dataflow, hybrid architectures, MIMD. The research described in this paper was supported in part by the Defense Advanced Research P...
Revised Concurrent Smalltalk Manual
, 1993
"... methods are indicated by the words Abstract Method . Primitives and macros are indicated by the word Primitive or Macro ; they are not first-class values and cannot be manipulated from within the language. To use a primitive such as eq as a first-class function, declare a new function feq that calls ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
methods are indicated by the words Abstract Method . Primitives and macros are indicated by the word Primitive or Macro ; they are not first-class values and cannot be manipulated from within the language. To use a primitive such as eq as a first-class function, declare a new function feq that calls eq on its two arguments. Revised Concurrent Smalltalk Manual 13 3.2. Syntax Tokens A Concurrent Smalltalk token is an arbitrarily long string composed of the characters A-Z, az, 0-9, _, !, ?, %, +, -, *, /, ., <, =, >, &, @, and ^. The characters !, ?, &, and @ may not be used at the beginning of a token, and a token may not be composed entirely of periods (.) or underscores (_). Also, tokens beginning with an underscore (_) or a percent sign (%) are reserved for system purposes and macros and should not be used by user programs. Case is not significant. A token is considered to be a number if it consists entirely of the characters 0-9, _, +, -, /, ., E, or I; it contains at least one ...
Incorporating Locality Management into Garbage Collection in Massively Parallel Object-Oriented Languages
- In Joint Symposium on Parallel Processing (JSPP
, 1993
"... This paper discusses how locality between objects affects the performance, and proposes a software architecture for enhancing locality while keeping load-balance reasonable at the minimum sacrifice of runtime overhead. Objects are created locally by default and long-lived objects are selectively mig ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper discusses how locality between objects affects the performance, and proposes a software architecture for enhancing locality while keeping load-balance reasonable at the minimum sacrifice of runtime overhead. Objects are created locally by default and long-lived objects are selectively migrated during garbage collection. By enhancing locality, message passings are likely to be local and objects are likely to be referred to from only local objects, thus they are quickly reclaimed when becoming garbage. By integrating migration process into garbage collection, load-balance is achieved and information useful for migration (e.g., reference counting) are collected at a low cost during garbage collection. 1 Introduction 1.1 Why Locality is Important When we spawn a new concurrent object (task), where should the new object be located? Should the object be created on the local node, i.e., on the same node where the creater object resides, or on a remote node, i.e., which is some o...

