Results 1 - 10
of
61
BIP: a new protocol designed for high performance networking on Myrinet
- In Workshop PC-NOW, IPPS/SPDP98
, 1998
"... Abstract. High speed networks are now providing incredible performances. Software evolution is slow and the old protocol stacks are no longer adequate for these kind of communication speed. When bandwidth increases, the latency should decrease as much in order to keep the system balance. With the cu ..."
Abstract
-
Cited by 165 (10 self)
- Add to MetaCart
Abstract. High speed networks are now providing incredible performances. Software evolution is slow and the old protocol stacks are no longer adequate for these kind of communication speed. When bandwidth increases, the latency should decrease as much in order to keep the system balance. With the current network technology, the main bottleneck is most of the time the software that makes the interface between the hardware and the user. We designed and implemented new protocols of transmission targeted to parallel computing that squeeze the most out of the high speed Myrinet network, without wasting time in system calls or memory copies, giving all the speed to the applications. This design is presented here as well as experimental results that lead to achieve real Gigabit/s throughput and less than 5 s latency on a cluster of PC workstations, with this a ordable network hardware. Moreover, our networking results compare favorably with the expensive parallel computers or ATM LANs. 1
Fast Messages (FM): Efficient, Portable Communication for Workstation Clusters and Massively-Parallel Processors
- IEEE CONCURRENCY
, 1997
"... ..."
Dynamic Coscheduling on Workstation Clusters
- Scheduling Strategies for Parallel Processing, volume 1459 of Lecture Notes in Computer Science
, 1998
"... Coscheduling has been shown to be a critical factor in achieving e#cient parallel execution in timeshared environments [12, 19, 4]. However, the most common approach, gang scheduling, has limitations in scaling, can compromise good interactive response, and requires that communicating processes be ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
Coscheduling has been shown to be a critical factor in achieving e#cient parallel execution in timeshared environments [12, 19, 4]. However, the most common approach, gang scheduling, has limitations in scaling, can compromise good interactive response, and requires that communicating processes be identified in advance. We explore a technique called dynamic coscheduling (DCS) which produces emergent coscheduling of the processes constituting a parallel job. Experiments are performed in a workstation environment with high performance networks and autonomous timesharing schedulers for each CPU. The results demonstrate that DCS can achieve e#ective, robust coscheduling for a range of workloads and background loads. Empirical comparisons to implicit scheduling and uncoordinated scheduling are presented. Under spin-block synchronization, DCS reduces job response times by up to 20% over implicit scheduling while maintaining fairness; and under spinning synchronization, DCS reduces ...
Efficient Layering for High Speed Communication: Fast Message 2.x
- In Proceedings of the 7th High Performance Distributed Computing (HPDC7
, 1998
"... permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions /
Portals 3.0: Protocol Building Blocks for Low Overhead Communication
- in Proceedings of the 2002 Workshop on Communication Architecture for Clusters
, 2002
"... This paper describes the evolution of the Portals message passing architecture and programming interface from its initial development on tightly-coupled massively parallel platforms to the current implementation running on a 1792-node commodity PC Linux cluster. Portals provides the basic building b ..."
Abstract
-
Cited by 38 (17 self)
- Add to MetaCart
This paper describes the evolution of the Portals message passing architecture and programming interface from its initial development on tightly-coupled massively parallel platforms to the current implementation running on a 1792-node commodity PC Linux cluster. Portals provides the basic building blocks needed for higher-level protocols to implement scalable, low-overhead communication. Portals has several unique characteristics that differentiate it from other high-performance system-area data movement layers. This paper discusses several of these features and illustrates how they can impact the scalability and performance of higher-level message passing protocols.
A closer look at coscheduling approaches for a network of workstations
- In Eleventh ACM Symposium on Parallel Algorithms and Architectures, SPAA'99
, 1999
"... Efficient scheduling of processes on processors of a Network of Workstations (NOW) is essential for good system performance. However, the design of such schedulers is challenging because of the complex interaction between several system and workload parameters. Coscheduling, though desirable, is imp ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Efficient scheduling of processes on processors of a Network of Workstations (NOW) is essential for good system performance. However, the design of such schedulers is challenging because of the complex interaction between several system and workload parameters. Coscheduling, though desirable, is impractical for such a loosely coupled environment. Two operations, waiting for a message and arrival of a message, can be used to take remedial actions that can guide the behavior of the system towards coscheduling using local information. We present a taxonomy of three possibilities for each of these two operations, leading to a design space of 3 3 scheduling mechanisms. This paper presents an extensive implementation and evaluation exercise in studying these mechanisms. Adhering to the philosophy that scheduling and communication are intertwined and should be studied in conjunction, a complete communication substrate for UltraSPARC workstations, connected by Myrinet and running Solaris 2.5.1, has been developed. This platform provides the entire Message Passing Interface (MPI) to readily run off-the-shelf MPI applications by employing protected low-latency user-level messaging. Several applications can concurrently use this interface. This platform has been used to design, implement, and uniformly evaluate nine scheduling strategies with a mixture of concurrent real applications with varying communication intensities. This includes four new schemes (Periodic Boost,
The design for a high performance MPI implementation on the Myrinet network
, 1999
"... . We present our MPI-BIP implementation, designed for Myrinet networks, and based on MPICH. By using our Basic Interface for Parallelism: BIP software layer, we obtain in this implementation of the MPI protocols results close to the peak hardware performance of the high speed Myrinet network. We pre ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
. We present our MPI-BIP implementation, designed for Myrinet networks, and based on MPICH. By using our Basic Interface for Parallelism: BIP software layer, we obtain in this implementation of the MPI protocols results close to the peak hardware performance of the high speed Myrinet network. We present the protocols we used to implement the MPI semantics, and the overall design of the implementation. We, then, present benchmarks and application results to show that this design leads to parallel multicomputer-like throughput and latency on a cluster of PC workstations. 1 Introduction In the last decade, researchers tried to use COWs (Cluster Of Workstations) as parallel computers. These clusters are typically connected by Ethernet networks and are often programmed with communication libraries like PVM (Parallel Virtual Machine [6]), or MPI over IP (Internet Protocol). There is two bottlenecks in these solutions that can restrict application programmers to coarse grain paral...
Implementing an API for Distributed Adaptive Computing Systems
- In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines
, 1999
"... Many applications require the use of multiple, loosely-coupled adaptive computing boards as part of a larger computing system. Two such application classes are embedded systems in which multiple boards are required to physically interface to different sensors/actuators and applications whose computa ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Many applications require the use of multiple, loosely-coupled adaptive computing boards as part of a larger computing system. Two such application classes are embedded systems in which multiple boards are required to physically interface to different sensors/actuators and applications whose computational demands require multiple
LFC: A Communication Substrate for Myrinet
, 1998
"... LFC is a new, low-level communication substrate for Myrinet, designed to support the development of high-performance communication software for parallel systems. LFC is novel in two ways. First, it exploits Myrinet's programmablenetwork interface (NI) to implement flow control, forward multicast tra ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
LFC is a new, low-level communication substrate for Myrinet, designed to support the development of high-performance communication software for parallel systems. LFC is novel in two ways. First, it exploits Myrinet's programmablenetwork interface (NI) to implement flow control, forward multicast traffic, reduce the overhead of network interrupts, and to provide a network-wide fetch-and-add operation. Second, LFC uses a single flow control mechanism at the network interface level for both point-to-point and multicast traffic. The integrated flow control mechanism significantly simplifies the implementation of an efficient multicast. We describe the design and implementation of LFC; we also evaluate LFC's performanceby comparing LFC with two high-performance message-passing systems for Myrinet. Finally, we outline the implementation of two client systems that use LFC: CRL, a distributed shared memory system, and MPI, a standard message-passing system. 1 Introduction LFC (Link-level Fl...
MPICH for SCI-connected Clusters
, 1999
"... MPICH is the most commonly used, freely available implementation of the MPI-1 standard including parts of the MPI2 standard. It is available for nearly every Unix-based system and can use a variety of communication facilities through its low-level Abstract Device Interface (ADI-2). However, no adapt ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
MPICH is the most commonly used, freely available implementation of the MPI-1 standard including parts of the MPI2 standard. It is available for nearly every Unix-based system and can use a variety of communication facilities through its low-level Abstract Device Interface (ADI-2). However, no adaption to the Scalable Coherent Interface (SCI) existed so far. This paper presents the design and implementation of such an adaption consisting of an ADI-2 device for the current MPICH distribution. The performance of this device is compared to other ADI-2 devices of MPICH usable on Intel x86 based clusters and also with a commercial MPI implementation for SCI-connected clusters. Keywords--- message passing, cluster, SCI, MPI, MPICH, ADI-2 I. INTRODUCTION Since the presentation of the first standard [1] in 1994, the Message Passing Interface (MPI) has become one of the most commonly used API for parallel computing due to its availability on nearly every parallel computer. Contrariwise, this...

