Results 1 - 10
of
83
Monsoon: an explicit token-store architecture
- In Proc. of the 17th Annual Int. Symp. on Comp. Arch
, 1990
"... Dataflow architectures tolerate long unpredictable com-munication delays and support generation and coordi-nation of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the proposed mecha-nisms are complex and introduce n ..."
Abstract
-
Cited by 148 (12 self)
- Add to MetaCart
Dataflow architectures tolerate long unpredictable com-munication delays and support generation and coordi-nation of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the proposed mecha-nisms are complex and introduce new mapping com-plications. This paper presents a greatly simplified ap-proach to dataflow execution, called the explicit token store (ETS) architecture, and its current realization in Monsoon. The essence of dynamic datallow execution is captured by a simple transition on state bits associ-ated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dy-namically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von Neumann archi-tecture. Although the addressing capability is restric-tive, there is exactly one instruction executed for each action on the dataflow graph. Thus, the machine ori-ented ETS model provides new understanding of the merits and the real cost of direct execution of dataflow graphs. 1
Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine
- in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages an Operating Systems
, 1991
"... Abstract: In this paper, we present a relatively primitive execution model for ne-grain parallelism, in which all synchronization, scheduling, and storage management is explicit and under compiler control. This is de ned by a threaded abstract machine (TAM) with a multilevel scheduling hierarchy. Co ..."
Abstract
-
Cited by 136 (6 self)
- Add to MetaCart
Abstract: In this paper, we present a relatively primitive execution model for ne-grain parallelism, in which all synchronization, scheduling, and storage management is explicit and under compiler control. This is de ned by a threaded abstract machine (TAM) with a multilevel scheduling hierarchy. Considerable temporal locality of logically related threads is demonstrated, providing an avenue for e ective register use under quasi-dynamic scheduling. A prototype TAM instruction set, TL0, has been developed, along with a translator to a variety of existing sequential and parallel machines. Compilation of Id, an extended functional language requiring ne-grain synchronization, under this model yields performance approaching that of conventional languages on current uniprocessors. Measurements suggest that the net cost of synchronization on conventional multiprocessors can be reduced to within a small factor of that on machines with elaborate hardware support, such as proposed data ow architectures. This brings into question whether tolerance to latency and inexpensive synchronization require speci c hardware support or merely an appropriate compilation strategy and program representation. 1
The Multiscalar Architecture
, 1993
"... The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent t ..."
Abstract
-
Cited by 113 (8 self)
- Add to MetaCart
The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent tasks in parallel using multiple processing elements. Splitting the instruction stream at statically determined boundaries allows the compiler to pass substantial information about the tasks to the hardware. The processing paradigm can be viewed as extensions of the superscalar and multiprocess-ing paradigms, and shares a number of properties of the sequential processing model and the dataflow processing model. The multiscalar paradigm is easily realizable, and we describe an implementation of the multis-calar paradigm, called the multiscalar processor. The central idea here is to connect multiple sequen-tial processors, in a decoupled and decentralized manner, to achieve overall multiple issue. The mul-tiscalar processor supports speculative execution, allows arbitrary dynamic code motion (facilitated by an efficient hardware memory disambiguation mechanism), exploits communication localities, and does all of these with hardware that is fairly straightforward to build. Other desirable aspects of the
The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism
- In Proceedings of the 19th Annual International Symposium on Computer Architecture
, 1992
"... We propose a new processing paradigm, called the Expandable Split Window (ESW) paradigm, for exploiting finegrain parallelism. This paradigm considers a window of instructions (possibly having dependencies) as a single unit, and exploits fine-grain parallelism by overlapping the execution of multipl ..."
Abstract
-
Cited by 110 (10 self)
- Add to MetaCart
We propose a new processing paradigm, called the Expandable Split Window (ESW) paradigm, for exploiting finegrain parallelism. This paradigm considers a window of instructions (possibly having dependencies) as a single unit, and exploits fine-grain parallelism by overlapping the execution of multiple windows. The basic idea is to connect multiple sequential processors, in a decoupled and decentralized manner, to achieve overall multiple issue. This processing paradigm shares a number of properties of the restricted dataflow machines, but was derived from the sequential von Neumann architecture. We also present an implementation of the Expandable Split Window execution model, and preliminary performance results. 1. INTRODUCTION The execution of a program, in an abstract form, can be considered to be a dynamic dataflow graph that encapsulates the data dependencies in the program. The nodes of the graph represent computation operations, and the arcs of the graph represent communication o...
Comparative Evaluation of Latency Reducing and Tolerating Techniques
- In Proceedings of the 18th Annual International Symposium on Computer Architecture
, 1991
"... Techniques that can cope with the large latency of memory accesses are essential for achieving high processor utilization in large-scale shared-memory multiprocessors. In this paper, we consider four architectural techniques that address the latency problem: (i) hardware coherent caches, (ii) relaxe ..."
Abstract
-
Cited by 103 (6 self)
- Add to MetaCart
Techniques that can cope with the large latency of memory accesses are essential for achieving high processor utilization in large-scale shared-memory multiprocessors. In this paper, we consider four architectural techniques that address the latency problem: (i) hardware coherent caches, (ii) relaxed memory consistency, (iii) softwarecontrolled prefetching, and (iv) multiple-context support. While some studies of benefits of the individual techniques have been done, no study evaluates all of the techniques within a consistent framework. This paper attempts to remedy this by providing a comprehensive evaluation of the benefits of the four techniques, both individually and in combinations, using a consistent set of architectural assumptions. The results in this paper have been obtained using detailed simulations of a large-scale shared-memory multiprocessor. Our results show that caches and relaxed consistency uniformly improve performance. The improvements due to prefetching and multiple contexts are sizeable, but are much more applicationdependent. Combinations of the various techniques generally attain better performance than each one on its own. Overall, we show that using suitable combinations of the techniques, performance can be improved by 4 to 7 times.
Space-Efficient Scheduling of Multithreaded Computations
- SIAM Journal on Computing
, 1993
"... . This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a single-processor execution. Utilizing a new graph-theoretic model of multithreaded computation, execution efficiency ..."
Abstract
-
Cited by 72 (11 self)
- Add to MetaCart
. This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a single-processor execution. Utilizing a new graph-theoretic model of multithreaded computation, execution efficiency is quantified by three important measures: T 1 is the time required for executing the computation on 1 processor, T1 is the time required by an infinite number of processors, and S 1 is the space required to execute the computation on 1 processor. A computation executed on P processors is time-efficient if the time is O(T 1 =P + T1 ), that is, it achieves linear speedup when P = O(T 1 =T1 ), and it is space-efficient if it uses O(S 1 P ) total space, that is, the space per processor is within a constant factor of that required for a 1-processor execution. The first result derived from this model shows that there exist multithreaded computations such that no execution schedule can simultan...
Multithreading: A Revisionist View of Dataflow Architectures
- IN PROCEEDINGS OF THE 18TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1991
"... Although they are powerful intermediate representations for compilers, pure dataflow graphs are incomplete, and perhaps even undesirable, machine languages. They are incomplete because it is hard to encode critical sections and imperative operations which are essential for the efficient execution of ..."
Abstract
-
Cited by 67 (1 self)
- Add to MetaCart
Although they are powerful intermediate representations for compilers, pure dataflow graphs are incomplete, and perhaps even undesirable, machine languages. They are incomplete because it is hard to encode critical sections and imperative operations which are essential for the efficient execution of operating system functions, such as resource management. They may be undesirable because they imply a uniform dynamic scheduling policy for all instructions, preventing a compiler from expressing a static schedule which could result in greater run time efficiency, both by reducing redundant operand synchronization, and by using high speed registers to communicate state between instructions. In this paper, we develop a new machine-level programming model which builds upon two previous improvements to the dataflow execution model: sequential scheduling of instructions, and multiported registers for expression temporaries. Surprisingly, these improvements have required almost no architectural...
Compiler-Controlled Multithreading for Lenient Parallel Languages
, 1991
"... Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads re ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide. Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses....
*T: A Multithreaded Massively Parallel Architecture
- IN PROCEEDINGS OF THE 19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1992
"... What should the architecture of each node in a general purpose, massively parallel architecture (MPA) be? We frame the question in concrete terms by describing two fundamental problems that must be solved well in any general purpose MPA. From this, we systematically develop the required logical orga ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
What should the architecture of each node in a general purpose, massively parallel architecture (MPA) be? We frame the question in concrete terms by describing two fundamental problems that must be solved well in any general purpose MPA. From this, we systematically develop the required logical organization of an MPA node, and present some details of *T (pronounced Start), a concrete architecture designed to these requirements. *T is a direct descendant of dynamic dataflow architectures, and unifies them with von Neumann architectures. We discuss a hand-compiled example and some compilation issues.
Advances in dataflow programming languages
- ACM Comput. Surv
, 2004
"... Abstract. Many developments have taken place within dataflow programming languages in the past decade. In particular, there has been a great deal of activity and advancement in the field of dataflow visual programming languages. The motivation for this article is to review the content of these recen ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
Abstract. Many developments have taken place within dataflow programming languages in the past decade. In particular, there has been a great deal of activity and advancement in the field of dataflow visual programming languages. The motivation for this article is to review the content of these recent developments and how they came

