Results 1 - 10
of
23
Hypertool: A Programming Aid for Message-Passing Systems
- IEEE Trans. on Parallel and Distributed Systems
, 1990
"... Abstract|As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more di cult and error-prone. This paper discusses programming assistance and automation concepts and their application to a program development tool for messag ..."
Abstract
-
Cited by 146 (17 self)
- Add to MetaCart
Abstract|As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more di cult and error-prone. This paper discusses programming assistance and automation concepts and their application to a program development tool for message-passing systems called Hypertool. It performs scheduling and handles the communication primitive insertion automatically. Two algorithms, based on the critical-path method, are presented for scheduling processes statically. Hypertool also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs. I.
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract
-
Cited by 64 (11 self)
- Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compile-time heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamic-level scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graph-analysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
NuMesh: An Architecture Optimized for Scheduled Communication
- Journal of Supercomputing
, 1996
"... The NuMesh system defines a high-speed communication substrate optimized for off-line routing. By determining possible communication paths at compile time, highly efficient hardware and software constructs can be exploited to yield superior network performance. These communication paths can be indep ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
The NuMesh system defines a high-speed communication substrate optimized for off-line routing. By determining possible communication paths at compile time, highly efficient hardware and software constructs can be exploited to yield superior network performance. These communication paths can be independently tuned to allow more utilized paths greater bandwidth. Although communication paths are scheduled, data need not be sent during every scheduled cycle. Flow control protocols allow for empty communication cycles as well as for data being backed up in the network. Limited gate delays between NuMesh registers, as well as single cycle message transfers, allow for a high clock frequency and low network latency. A highly pipelined architecture for this communication is presented and a mechanism for efficient flow controlled communication
Supporting Sets of Arbitrary Connections on iWarp Through Communication Context Switches
- In Proc. SPAA
, 1993
"... In this paper we introduce the ConSet communication model for distributed memory parallel computers. The communication needs of an application program can be satisfied by some arbitrary set of connections which are partitioned into discrete phases. A communication context switch is used to select th ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
In this paper we introduce the ConSet communication model for distributed memory parallel computers. The communication needs of an application program can be satisfied by some arbitrary set of connections which are partitioned into discrete phases. A communication context switch is used to select the active phase. We present an implementation of the ConSet model on the iWarp and describe its performance characteristics, contrasting it to a message passing implementation on the same machine. Our implementation demonstrates how one existing parallel computer can function as a “reconfigurable network ” without needing a new processor interconnect technology. The ConSet model works best when communication patterns can be optimized at compile time. We examine the interactions of the target architecture with the algorithmic problems encountered designing a communication compiler to effectively partition, route, and schedule connections. We built a prototype communication compiler for our iWarp implementation, and are using it to generate iWarp code. Looking at basic communication patterns as well as patterns generated by an iterative finite element PDE solver, we compare ConSet’s performance (using the compiler’s schedules) to that of message passing. Our experiments suggestthat ConSet communication offers a performance advantage over messagepassing in applications where the communication pattern is known at compile time. 1
Compile-Time Scheduling of Dataflow Program graphs with Dynamic Constructs
- University of California, Berkeley
, 1992
"... by ..."
Application Specific Communication Scheduling on Parallel Systems
- in Eigth International Conference on Parallel and Distributed Computing Systems
, 1995
"... In this paper, communication overhead inherent in parallel processing systems is reduced by considering static communication scheduling of messages in the interconnection network. Static communication scheduling is taken to mean the a priori (compile time) determination of when the nodes should send ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
In this paper, communication overhead inherent in parallel processing systems is reduced by considering static communication scheduling of messages in the interconnection network. Static communication scheduling is taken to mean the a priori (compile time) determination of when the nodes should send their messages to other nodes in the network. Although research on static scheduling for parallel systems has been ongoing for many years, our problem has not been rigorously studied. This paper builds a framework based on our newly developed graph model called a Collision Graph. Using this model, determining an optimal schedule is proven to be NP-Complete. Efficient algorithms are designed for a message burst message model. Index Terms - Communication, scheduling, real-time systems, direct networks, graph modeling. 1 Introduction In systems requiring high throughput and having realtime deadlines, application specific multi-processor designs are increasingly being used. For example, spa...
Collision Graph based Communication Scheduling for Parallel Systems
- Journal of Computers and their Applications
, 1997
"... Applications such as image processing, fluid mechanics, and geophysical data analysis are examples of problems that require the high computing performance provided by multi-processor systems. Such performance depends highly on the interprocessor communication time resulting from allocating tasks to ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
Applications such as image processing, fluid mechanics, and geophysical data analysis are examples of problems that require the high computing performance provided by multi-processor systems. Such performance depends highly on the interprocessor communication time resulting from allocating tasks to the individual processors. This research focuses on the development of techniques that reduce the communication overhead by intelligently scheduling message transmissions within a tightly-coupled processor network. Using a collision graph to model the system message traffic, static scheduling algorithms are developed to reduce the communication overhead. Since a priori knowledge about the network, required in static approaches, may not always be available or accurate, dynamic scheduling is also considered. A novel hybrid static-dynamic scheduling approach is presented which operates in a dynamic environment, yet uses known communication pattern information. Results show improvement over base...
Efficient Communication Scheduling with Re-routing based on Collision Graphs
- in International Symposium on High Performance Computing Systems
, 1997
"... Parallel systems are increasingly being used in applications requiring high throughput or which have real-time deadlines because of their potential for computation time savings. However, this savings is often offset by the communication overhead inherent in such systems. In this paper, such a commun ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Parallel systems are increasingly being used in applications requiring high throughput or which have real-time deadlines because of their potential for computation time savings. However, this savings is often offset by the communication overhead inherent in such systems. In this paper, such a communication overhead was encountered while performing simulations of partial differential equations (representing fluid dynamics problems) by using the multi-dimensional wave filters method. With tightly-coupled architectures as the platform, the static communication scheduling of messages in the network is addressed. The compile time determination of when nodes should send their messages to other nodes in the network is what is termed static communication scheduling. Additionally, the routing of these messages is also addressed. Although the static scheduling of computational tasks has been studied for some time, our problem is very new. This paper utilizes the newly developed Collision Graph ...
Efficient Circuit Partitioning Algorithms For Parallel Logic Simulation
- In Proceedings of Supercomputing ’89
, 1989
"... General purpose parallel processing machines are increasingly being used to speed up a variety of VLSI CAD applications. This paper addresses logic simulation on parallel machines by exploiting the concurrency in the circuit being simulated (called data parallelism) as opposed to exploiting paralle ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
General purpose parallel processing machines are increasingly being used to speed up a variety of VLSI CAD applications. This paper addresses logic simulation on parallel machines by exploiting the concurrency in the circuit being simulated (called data parallelism) as opposed to exploiting parallelism inherent in the simulation algorithm itself (called functional parallelism). The most crucial step in obtaining the maximum parallelism using data parallelism is the partitioning of circuit elements. We introduce a cost function which tries to model the simulation of a logic circuit in a parallel environment. The cost function tries to estimate the parallel run time for logic simulation given the processor assignment and the underlying multiprocessor architecture. We then present different heuristic algorithms to partition the circuit and evaluate the efficiency of these algorithms using the proposed cost function. Partitioning algorithms for both event-driven and compiled code simulati...
SCORE: An Efficient Technique to reduce Congestion in Parallel Systems
, 1997
"... In massively parallel systems, the performance gains are often significantly diminished by the inherent communication overhead. This overhead is caused by the required message passing resulting from the task allocation scheme. This paper presents the SCORE technique which acts to reduce the overhea ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
In massively parallel systems, the performance gains are often significantly diminished by the inherent communication overhead. This overhead is caused by the required message passing resulting from the task allocation scheme. This paper presents the SCORE technique which acts to reduce the overhead by both scheduling the communication and determining the message routing paths. The compile time determination of when nodes send their messages to other network nodes is termed communication scheduling. The recently developed Collision Graph is used to begin the study of compile-time analysis of run-time communication overhead. This NP-complete problem is formalized and heuristics are used in determining the SCORE algorithm which operates on a general case model of message traffic. Experiments performed show that this new method outperforms baseline techniques. Index Terms - Communication scheduling, parallel systems, graph modeling, routing. 1 Introduction Techniques to implement mas...

