Results 1 - 10
of
17
Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling
- in Proceedings of the 23rd Annual International Symposium on Computer Architecture
, 1995
"... Parallel systems supporting multithreading, or message passing in general, have typically used either polling or interrupts to handle incoming messages. Neither approach is ideal; either may lead to excessive overheads or message-handling latencies, depending on the application. This paper investiga ..."
Abstract
-
Cited by 45 (11 self)
- Add to MetaCart
Parallel systems supporting multithreading, or message passing in general, have typically used either polling or interrupts to handle incoming messages. Neither approach is ideal; either may lead to excessive overheads or message-handling latencies, depending on the application. This paper investigates a combined approach, where both are used depending on the circumstances. In the Polling Watchdog, a simple hardware extension limits the generation of interrupts to the cases where explicit polling fails to handle the message quickly. As an added benefit, this mechanism also has the potential to simplify the interaction between interrupts and the network accesses performed by the program. We present a message-handling mechanism designed for the EARTH-MANNA-S system, an implementation of the EARTH execution model on the MANNA multiprocessor. In contrast to the original EARTH-MANNA system, this system does not use a dedicated communication processor. Rather, synchronization and communicat...
A Superstrand Architecture
, 1997
"... In this paper, we present the superstrand architecture and it's underlying execution model. A superstrand architecture exploits the notion of a strand -- a block of instructions grouped together by a compiler to become a scheduling quantum of execution --- a strand is enabled at runtime if all neces ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
In this paper, we present the superstrand architecture and it's underlying execution model. A superstrand architecture exploits the notion of a strand -- a block of instructions grouped together by a compiler to become a scheduling quantum of execution --- a strand is enabled at runtime if all necessary dependence (data and control) constraints are satisfied. The code is partitioned into strands so that (1) source and destination of a "long latency" data dependence will be placed into different strands; (2) the source (e.g. a test) and destinations of a branch operation will be placed into different strands. We show that partitioning strands in this way gives the hardware access to a pool of strands resulting in a large "window" of instructions, which is necessary for sufficient instruction level parallelism, but reduces the number of potential dependences which the hardware must check, speculate and resolve at runtime, leading to significant savings in hardware complexity. The main re...
Thread Partitioning and Scheduling Based on Cost Model
, 1997
"... There has been considerable interest in implementing a multithreaded program execution and architecture model on a multiprocessor whose primary processors consist of today's off-the-shelf microprocessors. Unlike some custom-designed multithreaded processor architectures, which can interleave mult ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
There has been considerable interest in implementing a multithreaded program execution and architecture model on a multiprocessor whose primary processors consist of today's off-the-shelf microprocessors. Unlike some custom-designed multithreaded processor architectures, which can interleave multiple threads concurrently, conventional processors can only execute one thread at a time. This presents a unique and challenging problem to the compiler: partition a program into threads so that it executes both correctly and in minimal time. We present a new heuristic algorithm based on an interesting extension of the classical list scheduling algorithm. Based on a cost model, our algorithm groups instructions into threads by considering the trade-offs among parallelism, latency tolerance, thread switching costs and sequential execution efficiency. The proposed algorithm has been implemented, and its performance measured through experiments on a variety of architecture parameters a...
On Memory Models and Cache Management for Shared-Memory Multiprocessors
, 1995
"... A popular approach to designing shared-memory computer systems is to specify a memory model upon which a variety of program execution models may be implemented. Alternatively, one may choose a desired program execution model (PXM) and specify a memory model suited to the PXM. We argue that this s ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
A popular approach to designing shared-memory computer systems is to specify a memory model upon which a variety of program execution models may be implemented. Alternatively, one may choose a desired program execution model (PXM) and specify a memory model suited to the PXM. We argue that this second approach is to be preferred because it avoids the trap of specifying features of the memory model (consistency, for example) that may not needed to implement a desired program execution model. If the PXM is a dataflow model (one based on or equivalent to recursive dataflow program graphs), then no cache consistency problem need arise if the memory model supports synchronizing memory operations. Then why use a memory consistency model as a basis for designing shared-memory multiprocessors? One argument is that a general memory model can support a variety of PXMs. However, many good PXMs, object-oiented programming, for example, may be built on top of a basic program model that d...
Nomadic Threads: A Migrating Multithreaded Approach to Remote Memory Accesses in Multiprocessors
- In Proc. of the 1996 Conf. on Parallel Architectures and Compilation Techniques (PACT'96
, 1996
"... Machine (TAM) [14] is a software implementation of a multithreaded architecture that runs on conventional distributed--memory computers. It executes compiler--generated threads in parallel and emulates I--structure operations for array handling. A TAM thread is a collection of sequential instruction ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Machine (TAM) [14] is a software implementation of a multithreaded architecture that runs on conventional distributed--memory computers. It executes compiler--generated threads in parallel and emulates I--structure operations for array handling. A TAM thread is a collection of sequential instructions that do not jump out of the thread and only reference data available in the current frame, though they may issue I--structure fetches. Results from I-- structure fetches and data from other threads are placed into inlets in the frame. Each thread has a set of inlets that, when full, allows the thread to become enabled. Results of a thread may be sent to the inlets of other threads. Cilk [15] is another software multithreading system that uses threads specified in a modified C language. A closure, which stores the inputs of a thread, is ready to run when all its argument slots are full. Ready closures may be stolen by idle processors to balance the load. 2.3. Thread migration Thread mig...
Compiling For Multithreaded Architectures
, 1999
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xvi Chapter 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Latency Tolerant Architectures : : : : : : : : : : : : : : : : : : : : : 1 1.2 Exploiting Thread Level Parallelism : : : : : : : : : : : ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xvi Chapter 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Latency Tolerant Architectures : : : : : : : : : : : : : : : : : : : : : 1 1.2 Exploiting Thread Level Parallelism : : : : : : : : : : : : : : : : : : : 4 1.3 Multithreaded Models : : : : : : : : : : : : : : : : : : : : : : : : : : 5 1.4 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 1.5 Claim of Originality : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 1.6 Synopsis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 2 EARTH ARCHITECTURE, EARTH-MANNA SYSTEM, AND PROGRAMMING LANGUAGES : : : : : : : : : : : : : : : : : : : : 18 2.1 EARTH Multithreading Model : : : : : : : : : : : : : : : : : : : : : 18 2.1.1 EARTH Architecture : : : : : : : : : : : : : : : : : : : : : : 19 2.1.2 EARTH Synchronization : : : : : : : : : : : : : : : : : : : : 21 2.2 EARTH-MANNA Syst...
Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-grain Multithreaded Execution Model
- In Proc. of the Multithreaded Execution Architecture and Compilation Workshop
, 1999
"... The evolution of computer systems based on fine-grain multithreaded program execution models introduces both unique opportunities and tough challenges for the support of dynamic load balancing. Although load balancing is an active research topic in the distributed computing field, there is still a l ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The evolution of computer systems based on fine-grain multithreaded program execution models introduces both unique opportunities and tough challenges for the support of dynamic load balancing. Although load balancing is an active research topic in the distributed computing field, there is still a lack of a detailed study of the different dynamic load balancing strategies under a fine-grain multithreaded execution environment. This paper describes the design, implementation and performance evaluation of nine dynamic load balancing algorithms running on the EARTHSP multithreaded multiprocessor testbed - a portable implementation of the EARTH multithreaded program execution model [4] on the IBM-SP2 multiprocessor system. In the course of this study we developed a set of generic test cases, which we call stress tests, that measure the performance of the different dynamic load balancing algorithms for specific workload patterns. Based on the experimental results from the stress tests and ...
Advances in the Dataflow Computational Model
, 1999
"... The dataflow program graph execution model, or dataflow for short, is an alternative to the stored- program (von Neumann) execution model. Because it relies on a graph representation of programs, the strengths of the dataflow model are very much the complements of those of the stored-program one. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The dataflow program graph execution model, or dataflow for short, is an alternative to the stored- program (von Neumann) execution model. Because it relies on a graph representation of programs, the strengths of the dataflow model are very much the complements of those of the stored-program one. In the last thirty or so years since it was proposed, the dataflow model of computation has been us ed and developed in very many areas of computing research: from programming languages to processor design,and from signal processing to reconfigurable computing. This paper is a review of the current state-of-the-art in the applications of the dataflow model of computation. It focuses on three areas: multithreaded computing, signal processing and reconfigurable computing.
How "hard" Is Thread Partitioning and How "bad" Is a List Scheduling Based Partitioning Algorithm?
- In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures
, 1998
"... Adequate compiler support is essential to take advantage of the emerging multithreaded architecture. In this paper, we address two important questions in thread partitioning, which is a key step in compiler design for multithreaded architectures. The questions in which we are interested are: how "h ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Adequate compiler support is essential to take advantage of the emerging multithreaded architecture. In this paper, we address two important questions in thread partitioning, which is a key step in compiler design for multithreaded architectures. The questions in which we are interested are: how "hard" is it to partition threads and how "bad" will a heuristic partitioning algorithm be? We propose a cost model for both multithreaded machines and user programs, and we formulate the thread partition problem as an optimization problem. Then, we answer the above two questions by proving that: 1) for the class of programs and architecture models we are interested in, the problem of thread partition for minimum execution time is NP-hard; 2) the run length produced by any list scheduling based thread partitioning algorithm is at most twice as long as that of an optimal solution. 1 Introduction Multithreaded architectures have been attracting increased attentions due to their ability of hidi...
Dynamic Load Balancing Issues In The Earth Runtime System
, 1999
"... Multithreading is a promising approach to address the problems inherent in multiprocessor systems, such as network and synchronization latencies. Moreover, the benefits of multithreading are not limited to loop-based algorithms but apply also to irregular parallelism. EARTH - Efficient Architecture ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Multithreading is a promising approach to address the problems inherent in multiprocessor systems, such as network and synchronization latencies. Moreover, the benefits of multithreading are not limited to loop-based algorithms but apply also to irregular parallelism. EARTH - Efficient Architecture for Running THreads, is a multithreaded model supporting fine-grain, non-preemptive threads. This model is supported by a C-based runtime system which provides the multithreaded environment for the execution of concurrent programs. This thesis describes the design and implementation of a set of dynamic load balancing algorithms, and an in-depth study of their behavior with divide-and-conquer, regular, and irregular classes of applications. The results described in this thesis are based on EARTH-SP2, an implementation of the EARTH program execution model on the IBM SP-2, a distributed memory multiprocessor system. The main results of this study are as follows: ffl A randomizing load balance...

