Results 1 - 10
of
119
An Implementation and Analysis of the Virtual Interface Architecture
- IN PROCEEDINGS OF SC'98
, 1998
"... Rapid developments in networking technology and a rise in clustered computing have driven research studies in high performance communication architectures. In an effort to standardize the work in this area, industry leaders have developed the Virtual Interface Architecture (VIA) specification. This ..."
Abstract
-
Cited by 91 (7 self)
- Add to MetaCart
Rapid developments in networking technology and a rise in clustered computing have driven research studies in high performance communication architectures. In an effort to standardize the work in this area, industry leaders have developed the Virtual Interface Architecture (VIA) specification. This architecture seeks to provide an operating system-independent infrastructure for high-performance user-level networking in a generic environment. This paper evaluates the inherent costs and performance potential of the Virtual Interface Architecture through a prototype implementation over Myrinet. The VIA prototype is compared against established research user-level networks using simple communication benchmarks on the same hardware. We consider extensions to the VI Architecture that improve its performance for certain types of communication traffic and outline further research areas in the VIA design space that merit investigation.
Automatically Tuned Collective Communications
- In Proceedings of SC99: High Performance Networking and Computing
, 2000
"... The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the di#erences in architectures, network parameters and the storage capacity of th ..."
Abstract
-
Cited by 40 (8 self)
- Add to MetaCart
The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the di#erences in architectures, network parameters and the storage capacity of the underlying MPI implementation. In this paper, we discuss an approach in which the collective communications are tuned for a given system by conducting a series of experiments on the system. We also discuss a dynamic topology method that uses the tuned static topology shape, but re-orders the logical addresses to compensate for changing run time variations. A series of experiments were conducted comparing our tuned collective communication operations to various native vendor MPI implementations. The use of the tuned collective communications resulted in about 30%-650% improvement in performance over the native MPI implelementations. 1. INTRODUCTION This project developed out of an attempt...
Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems
- In Proceedings of the 26th International Symposium on Computer Architecture
, 1999
"... The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardwarecoherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper sho ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardwarecoherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper shows that by providing simple and general support for asynchronous message handling in a commodity network interface (NI), and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. They also require neither visibility into the node memory system nor code instrumentation to identify memory operations. We prototype the mechanisms and such a synchronous home-based LRC protocol, called GeNIMA (GEneral-purpose Network Interface support in a shared Memory Abstraction), on a cluster of SMPs with a programmable NI, though the mechan...
Portals 3.0: Protocol Building Blocks for Low Overhead Communication
- in Proceedings of the 2002 Workshop on Communication Architecture for Clusters
, 2002
"... This paper describes the evolution of the Portals message passing architecture and programming interface from its initial development on tightly-coupled massively parallel platforms to the current implementation running on a 1792-node commodity PC Linux cluster. Portals provides the basic building b ..."
Abstract
-
Cited by 38 (17 self)
- Add to MetaCart
This paper describes the evolution of the Portals message passing architecture and programming interface from its initial development on tightly-coupled massively parallel platforms to the current implementation running on a 1792-node commodity PC Linux cluster. Portals provides the basic building blocks needed for higher-level protocols to implement scalable, low-overhead communication. Portals has several unique characteristics that differentiate it from other high-performance system-area data movement layers. This paper discusses several of these features and illustrates how they can impact the scalability and performance of higher-level message passing protocols.
User-Space Communication: A Quantitative Study
, 1998
"... Powerful commodity systems and networks o#er a promising direction for high performance computing because they are inexpensive and they closely track technology progress. However, high, raw--hardware performance is rarely delivered to the end user. Previous work has shown that the bottleneck in thes ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Powerful commodity systems and networks o#er a promising direction for high performance computing because they are inexpensive and they closely track technology progress. However, high, raw--hardware performance is rarely delivered to the end user. Previous work has shown that the bottleneck in these architectures is the overheads imposed by the software communication layer. To reduce these overheads, researchers have proposed a number of user-space communication models. The common feature of these models is that applications have direct access to the network, bypassing the operating system in the common case and thus avoiding the cost of send/receive system calls. In this paper we examine five user--space communication layers, that represent di#erent points in the configuration space: Generic AM, BIP-0.92, FM-2.02, PM-1.2, and VMMC-2. Although these systems support di#erent communication paradigms and employ a variety of di#erent implementation tradeo#s, we are able to quantitatively...
MPI-StarT: Delivering Network Performance to Numerical Applications
- In SC
, 1998
"... : We describe an MPI implementation for a cluster of SMPs interconnected by a high-performance interconnect. This work is a collaboration between a numerical applications programmer and a cluster interconnect architect. The collaboration started with the modest goal of satisfying the communication ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
: We describe an MPI implementation for a cluster of SMPs interconnected by a high-performance interconnect. This work is a collaboration between a numerical applications programmer and a cluster interconnect architect. The collaboration started with the modest goal of satisfying the communication needs of a specific numerical application, MITMatlab. However, by supporting the MPI standard MPI-StarT readily extends support to a host of applications. MPI-StarT is derived from MPICH by developing a custom implementation of the Channel Interface. Some changes in MPICH's ADI and Protocol Layers are also necessary for correct and optimal operation. MPI-StarT relies on the host SMPs' shared memory mechanism for intra-SMP communication. Inter-SMP communication is supported through StarT-X. The StarT-X NIU allows a cluster of PCI-equipped host platforms to communicate over the Arctic Switch Fabric. Currently, StarT-X is utilized by a cluster of SUN E5000 SMPs as well as a cluster of Intel Pen...
SPINE: An operating system for intelligent network adapters
, 1998
"... Abstract: The emergence of fast, cheap embedded processors presents the opportunity for processing to occur on the network adapter. We are investigating how a system design incorporating such an intelligent network adapter can be used for applications that benefit from being tightly integrated with ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Abstract: The emergence of fast, cheap embedded processors presents the opportunity for processing to occur on the network adapter. We are investigating how a system design incorporating such an intelligent network adapter can be used for applications that benefit from being tightly integrated with the network subsystem. We are developing a safe, extensible operating system, called SPINE, which enables applications to compute directly on the network adapter. We demonstrate the feasibility of our approach with two applications: a video client and an Internet Protocol router. As a result of our system structure, image data is transferred only once over the I/O bus and places no load on the host CPU to display video at aggregate rates exceeding 100 Mbps. Similarly, the IP router can forward roughly 10,000 packets per second on each network adapter, while placing no load on the host CPU. Based on our experiences, we describe three hardware features useful for improving performance. Finally, we conclude that offloading work to the network adapter can make sense, even using current embedded processor technology. 1
The design for a high performance MPI implementation on the Myrinet network
, 1999
"... . We present our MPI-BIP implementation, designed for Myrinet networks, and based on MPICH. By using our Basic Interface for Parallelism: BIP software layer, we obtain in this implementation of the MPI protocols results close to the peak hardware performance of the high speed Myrinet network. We pre ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
. We present our MPI-BIP implementation, designed for Myrinet networks, and based on MPICH. By using our Basic Interface for Parallelism: BIP software layer, we obtain in this implementation of the MPI protocols results close to the peak hardware performance of the high speed Myrinet network. We present the protocols we used to implement the MPI semantics, and the overall design of the implementation. We, then, present benchmarks and application results to show that this design leads to parallel multicomputer-like throughput and latency on a cluster of PC workstations. 1 Introduction In the last decade, researchers tried to use COWs (Cluster Of Workstations) as parallel computers. These clusters are typically connected by Ethernet networks and are often programmed with communication libraries like PVM (Parallel Virtual Machine [6]), or MPI over IP (Internet Protocol). There is two bottlenecks in these solutions that can restrict application programmers to coarse grain paral...
The Design and Evaluation of High Performance Communication Using a Gigabit Ethernet
, 1999
"... A high performance communication facility, called the GigaE PM, has been designed and implemented for parallel applications on clusters of computers using a Gigabit Ethernet. The GigaE PM provides not only a reliable high bandwidth and low latency communication function, but also supports existing ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
A high performance communication facility, called the GigaE PM, has been designed and implemented for parallel applications on clusters of computers using a Gigabit Ethernet. The GigaE PM provides not only a reliable high bandwidth and low latency communication function, but also supports existing network protocols such as TCP/IP. In the design of the GigaE PM, it is assumed that the Gigabit Ethernet card used has a dedicated processor and its program can be modied. A reliable communication mechanism for a parallel application is implemented on the rmware while existing network protocols are handled by an operating system kernel. A prototype system has been implemented using an Essential Communications Gigabit Ethernet card. The performance results show that a 48.3 s round trip time for a four byte user message, and 56.7 MBytes/sec bandwidth for a 1,468 byte message have been achieved on Intel Pentium II 400 MHz PCs. We have implemented MPICH-PM on top of the GigaE PM, and evaluat...
The MultiCluster Model to the Integrated Use of Multiple Workstation Clusters
- Proc. of the 3rd Workshop on Personal Computerbased Networks of Workstations, 2000
, 2000
"... . One of the new research tendencies within the well-established cluster computing area is the growing interest in the use of multiple workstation clusters as a single virtual parallel machine, in much the same way as individual workstations are nowadays connected to build a single parallel cluster. ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
. One of the new research tendencies within the well-established cluster computing area is the growing interest in the use of multiple workstation clusters as a single virtual parallel machine, in much the same way as individual workstations are nowadays connected to build a single parallel cluster. In this paper we present an analysis on several aspects concerning the integration of different workstation clusters, such as Myrinet and SCI, and propose our MultiCluster model as an alternative to achieve such integrated architecture. 1 Introduction Cluster computing is nowadays a common practice to many research groups around the world that search for high performance to a great variety of parallel and distributed applications, like aerospacial and molecular simulations, Web servers, data mining, and so forth. To achieve high performance, many efforts have been devoted to the design and implementation of low overhead communication libraries, specially dedicated to fast communicat...

