Results 1 - 10
of
15
Monsoon: an explicit token-store architecture
- In Proc. of the 17th Annual Int. Symp. on Comp. Arch
, 1990
"... Dataflow architectures tolerate long unpredictable com-munication delays and support generation and coordi-nation of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the proposed mecha-nisms are complex and introduce n ..."
Abstract
-
Cited by 148 (12 self)
- Add to MetaCart
Dataflow architectures tolerate long unpredictable com-munication delays and support generation and coordi-nation of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the proposed mecha-nisms are complex and introduce new mapping com-plications. This paper presents a greatly simplified ap-proach to dataflow execution, called the explicit token store (ETS) architecture, and its current realization in Monsoon. The essence of dynamic datallow execution is captured by a simple transition on state bits associ-ated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dy-namically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von Neumann archi-tecture. Although the addressing capability is restric-tive, there is exactly one instruction executed for each action on the dataflow graph. Thus, the machine ori-ented ETS model provides new understanding of the merits and the real cost of direct execution of dataflow graphs. 1
Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine
- in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages an Operating Systems
, 1991
"... Abstract: In this paper, we present a relatively primitive execution model for ne-grain parallelism, in which all synchronization, scheduling, and storage management is explicit and under compiler control. This is de ned by a threaded abstract machine (TAM) with a multilevel scheduling hierarchy. Co ..."
Abstract
-
Cited by 136 (6 self)
- Add to MetaCart
Abstract: In this paper, we present a relatively primitive execution model for ne-grain parallelism, in which all synchronization, scheduling, and storage management is explicit and under compiler control. This is de ned by a threaded abstract machine (TAM) with a multilevel scheduling hierarchy. Considerable temporal locality of logically related threads is demonstrated, providing an avenue for e ective register use under quasi-dynamic scheduling. A prototype TAM instruction set, TL0, has been developed, along with a translator to a variety of existing sequential and parallel machines. Compilation of Id, an extended functional language requiring ne-grain synchronization, under this model yields performance approaching that of conventional languages on current uniprocessors. Measurements suggest that the net cost of synchronization on conventional multiprocessors can be reduced to within a small factor of that on machines with elaborate hardware support, such as proposed data ow architectures. This brings into question whether tolerance to latency and inexpensive synchronization require speci c hardware support or merely an appropriate compilation strategy and program representation. 1
Compiler-Controlled Multithreading for Lenient Parallel Languages
, 1991
"... Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads re ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide. Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses....
A compiler controlled Threaded Abstract Machine
- Journal of Parallel and Distributed Computing
, 1993
"... ..."
A Cost Analysis for a Higher-order Parallel Programming Model
, 1996
"... Programming parallel computers remains a difficult task. An ideal programming environment should enable the user to concentrate on the problem solving activity at a convenient level of abstraction, while managing the intricate low-level details without sacrificing performance. This thesis investiga ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Programming parallel computers remains a difficult task. An ideal programming environment should enable the user to concentrate on the problem solving activity at a convenient level of abstraction, while managing the intricate low-level details without sacrificing performance. This thesis investigates a model of parallel programming based on the BirdMeertens Formalism (BMF). This is a set of higher-order functions, many of which are implicitly parallel. Programs are expressed in terms of functions borrowed from BMF. A parallel implementation is defined for each of these functions for a particular topology, and the associated execution costs are derived. The topologies which have been considered include the hypercube, 2-D torus, tree and the linear array. An analyser estimates the costs associated with different implementations of a given program and selects a cost-effective one for a given topology. All the analysis is performed at compile-time which has the advantage of reducing run-...
Evaluation of Mechanisms for Fine-Grained Parallel Programs in the J-Machine and the CM-5
, 1993
"... This paper uses an abstract machine approach to compare the mechanisms of two parallel machines: the J-Machine and the CM-5. High-level parallel programs are translated by a single optimizing compiler to a finegrained abstract parallel machine, TAM. A final compilation step is unique to each machine ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper uses an abstract machine approach to compare the mechanisms of two parallel machines: the J-Machine and the CM-5. High-level parallel programs are translated by a single optimizing compiler to a finegrained abstract parallel machine, TAM. A final compilation step is unique to each machine and optimizes for specifics of the architecture. By determining the cost of the primitives and weighting them by their dynamic frequency in parallel programs, we quantify the effectiveness of the followingmechanisms individuallyand in combination. Efficient processor/network coupling proves valuable. Message dispatch is found to be less valuable without atomic operations that allow the scheduling levels to cooperate. Multiple hardware contexts are of small value when the contexts cooperate and the compiler can partition the register set. Tagged memory provides little gain. Finally, the performance of the overall system is strongly influenced by the performance of the memory system and the f...
Realtime Signal Processing -- Dataflow, Visual, and Functional Programming
, 1995
"... This thesis presents and justifies a framework for programming real-time signal processing systems. The framework extends the existing "block-diagram" programming model; it has three components: a very high-level textual language, a visual language, and the dataflow process network model of computat ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This thesis presents and justifies a framework for programming real-time signal processing systems. The framework extends the existing "block-diagram" programming model; it has three components: a very high-level textual language, a visual language, and the dataflow process network model of computation. The dataflow process network model, although widely-used, lacks a formal description, and I provide a semantics for it. The formal work leads into a new form of actor. Having established the semantics of dataflow processes, the functional language Haskell is layered above this model, providing powerful features---notably polymorphism, higher-order functions, and algebraic program transformation---absent in block-diagram systems. A visual equivalent notation for Haskell, Visual Haskell, ensures that this power does not exclude the "intuitive" appeal of visual interfaces; with some intelligent layout and suggestive icons, a Visual Haskell program can be made to look very like a block dia...
Automatic Parallel Program Generation and Optimization from Data Decompositions
, 1991
"... this paper a general framework is described for the automatic generation of parallel programs based on a separately specified decomposition of the data. To this purpose, programs and data decompositions are expressed in a calculus, called Vcal. It is shown that by rewriting calculus expressions, Sin ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
this paper a general framework is described for the automatic generation of parallel programs based on a separately specified decomposition of the data. To this purpose, programs and data decompositions are expressed in a calculus, called Vcal. It is shown that by rewriting calculus expressions, Single Program Multiple Data (SPMD) code can be generated for shared-memory as well as distributed-memory parallel processors. Optimizations are derived for certain classes of access functions to data structures, subject to block, scatter, and block/scatter decompositions. The presented calculus and transformations are language independent. 1.
A Comparison of Implicit and Explicit Parallel Programming
- University of Arizona
, 1994
"... The impact of the parallel programming model on scientific computing is examined. A comparison is made between Sisal, a functional language with implicit parallelism, and SR, an imperative language with explicit parallelism. Both languages are modern, highlevel, concurrent programming languages. Fiv ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The impact of the parallel programming model on scientific computing is examined. A comparison is made between Sisal, a functional language with implicit parallelism, and SR, an imperative language with explicit parallelism. Both languages are modern, highlevel, concurrent programming languages. Five different scientific applications were programmed in each language, and evaluated for programmability and performance. The performance of these two concurrent languages on a shared-memory multiprocessor is compared to each other and to programs written in C with parallelism provided by library calls. August 1, 1994 Department of Computer Science The University of Arizona Tucson, AZ 85721 1 A version of this paper will appear in the Journal of Parallel and Distributed Computing. 2 This research was supported by NSF grants CCR-9108412 and CDA-8822652 1 Introduction With most scientific computing applications, tension exists between the size of the problem (or the precision of the solut...
Booster: A High-Level Language for Portable Parallel Algorithms
, 1991
"... this paper, the language Booster is described. Booster is a high-level, fourth generation, parallel programming language. The language has been designed to program parallel algorithms for a wide variety of target parallel architectures. Booster has a strong separation of concerns, featuring amongst ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
this paper, the language Booster is described. Booster is a high-level, fourth generation, parallel programming language. The language has been designed to program parallel algorithms for a wide variety of target parallel architectures. Booster has a strong separation of concerns, featuring amongst others a clear separation of algorithm description and algorithm decomposition and-representation. Programs written in Booster are translated to imperative languages, such as FORTRAN or C, and can be easily integrated in large applications. Parallelism can be obtained by applying data and/or code decomposition. Once algorithm and decomposition are described the transformation is done automatically

