Results 1 - 10
of
58
Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer
- IN PROCEEDINGS OF THE 21ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1994
"... The network interfaces of existing multicomputers require a significant amount of software overhead to provide protection and to implement message passing protocols. This paper describes the design of a low-latency, high-bandwidth, virtual memory-mapped network interface for the SHRIMP multicomputer ..."
Abstract
-
Cited by 241 (24 self)
- Add to MetaCart
The network interfaces of existing multicomputers require a significant amount of software overhead to provide protection and to implement message passing protocols. This paper describes the design of a low-latency, high-bandwidth, virtual memory-mapped network interface for the SHRIMP multicomputer project at Princeton University. Without sacrificing protection, the network interface achieves low latency by using virtual memory mapping and write-latency hiding techniques, and obtains high bandwidth by providing a user-level block data transfer mechanism. We have implemented several message passing primitives in an experimental environment, demonstrating that our approach can reduce the message passing overhead to a few user-level instructions.
Managing Multiple Communication Methods in High-Performance Networked Computing Systems
- Journal of Parallel and Distributed Computing
, 1997
"... Modern networked computing environments and applications often require---or can benefit from---the use of multiple communication substrates, transport mechanisms, and protocols, chosen according to where communication is directed, what is communicated, or when communication is performed. We propose ..."
Abstract
-
Cited by 79 (13 self)
- Add to MetaCart
Modern networked computing environments and applications often require---or can benefit from---the use of multiple communication substrates, transport mechanisms, and protocols, chosen according to where communication is directed, what is communicated, or when communication is performed. We propose techniques that allow multiple communication methods to be supported transparently in a single application, with either automatic or user-specified selection criteria guiding the methods used for each communication. We explain how communication link and remote service request mechanisms facilitate the specification and implementation of multimethod communication. These mechanisms have been implemented in the Nexus multithreaded runtime system, and we use this system to illustrate solutions to various problems that arise when implementing multimethod communication. We also illustrate the application of our techniques by describing a multimethod, multithreaded implementation of the Message Pas...
VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication
- IN PROCEEDINGS OF HOT INTERCONNECTS
, 1997
"... The basic virtual memory-mapped communication (VMMC) model provides protected, direct communication between the sender's and receiver's virtual address spaces, but it does not support high-level connection-oriented communication APIs well. This paper presents VMMC-2, an extension to the basic VMMC.W ..."
Abstract
-
Cited by 71 (18 self)
- Add to MetaCart
The basic virtual memory-mapped communication (VMMC) model provides protected, direct communication between the sender's and receiver's virtual address spaces, but it does not support high-level connection-oriented communication APIs well. This paper presents VMMC-2, an extension to the basic VMMC.We describe the design, implementation, and evaluate the performance of three mechanisms in VMMC-2: (1) a user-managed TLB mechanism for address translation which enables user libraries to dynamically manage the amount of pinned space and requires only driver support from many operating systems# (2) a transfer redirection mechanism whichavoids copying on the receiver 's side# (3) a reliable communication protocol at the data link layer whichavoids copying on the sender's side. Tovalidate our extensions we implemented stream sockets on top of the VMMC-2 running on a Myrinet network of Pentium PCs. This zero-copysockets implementation provides a maximum bandwidth of over 84 Mbytes/s and a one-way latency of 20 µs.
Flick: A Flexible, Optimizing IDL Compiler
- in Proceedings of ACM SIGPLAN '97 Conference on Programming Language Design and Implementation (PLDI), (Las Vegas, NV), ACM
, 1997
"... An interface definition language (IDL) is a nontraditional language for describing interfaces between software components. IDL compilers generate "stubs" that provide separate communicating processes with the abstraction of local object invocation or procedure call. High-quality stub generation is ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
An interface definition language (IDL) is a nontraditional language for describing interfaces between software components. IDL compilers generate "stubs" that provide separate communicating processes with the abstraction of local object invocation or procedure call. High-quality stub generation is essential for applications to benefit from componentbased designs, whether the components reside on a single computer or on multiple networked hosts. Typical IDL compilers, however, do little code optimization, incorrectly assuming that interprocess communication is always the primary bottleneck. More generally, typical IDL compilers are "rigid" and limited to supporting only a single IDL, a fixed mapping onto a target language, and a narrow range of data encodings and transport mechanisms. Flick, our new IDL compiler, is based on the insight that IDLs are true languages amenable to modern compilation techniques. Flick exploits concepts from traditional programming language compilers to br...
Efficient Layering for High Speed Communication: Fast Message 2.x
- In Proceedings of the 7th High Performance Distributed Computing (HPDC7
, 1998
"... permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions /
Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems
- In Proceedings of the 26th International Symposium on Computer Architecture
, 1999
"... The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardwarecoherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper sho ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardwarecoherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper shows that by providing simple and general support for asynchronous message handling in a commodity network interface (NI), and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. They also require neither visibility into the node memory system nor code instrumentation to identify memory operations. We prototype the mechanisms and such a synchronous home-based LRC protocol, called GeNIMA (GEneral-purpose Network Interface support in a shared Memory Abstraction), on a cluster of SMPs with a programmable NI, though the mechan...
Realizing the Performance Potential of the Virtual Interface Architecture
, 1999
"... The Virtual Interface (VI) Architecture provides protected userlevel communication with high delivered bandwidth and low permessage latency, particularly for small messages. The VI Architecture attempts to reduce latency by eliminating user/kernel transitions on routine data transfers and by allowin ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
The Virtual Interface (VI) Architecture provides protected userlevel communication with high delivered bandwidth and low permessage latency, particularly for small messages. The VI Architecture attempts to reduce latency by eliminating user/kernel transitions on routine data transfers and by allowing direct use of user memory for network buffering. This results in significantly lower latencies than those achieved by network protocols such as TCP/IP and UDP. In this paper we examine the low-level performance of two VI implementations, one implemented in hardware, the other implemented in device driver software. Using a set of low-level benchmarks, we measure bandwidth, latency, and processor utilization as a function of message size for the GigaNet cLAN and Tandem ServerNet VI implementations. We report that both VI implementations offer significant performance advantage relative to the corresponding UDP implementation on the same hardware. We also investigate the problems associated wi...
Early Experience with Message-Passing on the SHRIMP Multicomputer
- IN PROCEEDINGS OF THE 23RD ANNUAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1996
"... The SHRIMP multicomputer provides virtual memorymapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that a data transfer can be done without the interven ..."
Abstract
-
Cited by 27 (13 self)
- Add to MetaCart
The SHRIMP multicomputer provides virtual memorymapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that a data transfer can be done without the intervention of the receiving node CPU. An important question is whether such a mechanism can indeed deliver all of the available hardware performance to applications which use conventional message-passing libraries. This paper
A MultiPlatform Co-Array Fortran Compiler
- In Proceedings of the 13th Intl. Conference of Parallel Architectures and Compilation Techniques, Antibes Juan-les-Pins
, 2004
"... Co-array Fortran (CAF)—a small set of extensions to Fortran 90—is an emerging model for scalable, global address space parallel programming. CAF’s global address space programming model simplifies the development of singleprogram-multiple-data parallel programs by shifting the burden for managing th ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Co-array Fortran (CAF)—a small set of extensions to Fortran 90—is an emerging model for scalable, global address space parallel programming. CAF’s global address space programming model simplifies the development of singleprogram-multiple-data parallel programs by shifting the burden for managing the details of communication from developers to compilers. This paper describes cafc—a prototype implementation of an open-source, multiplatform CAF compiler that generates code well-suited for today’s commodity clusters. The cafc compiler translates CAF into Fortran 90 plus calls to one-sided communication primitives. The paper describes key details of cafc’s approach to generating efficient code for multiple platforms. Experiments compare the performance of CAF and MPI versions of several NAS parallel benchmarks on an Alpha cluster with a Quadrics interconnect, an Itanium 2 cluster with a Myrinet 2000 interconnect and an Itanium 2 cluster with a Quadrics interconnect. These experiments show that cafc compiles CAF programs into code that delivers performance roughly equal to that of hand-optimized MPI programs. 1.
Software Distributed Shared Memory over Virtual Interface Architecture: Implementation and Performance
- IN PROCEEDINGS OF THE 3RD EXTREME LINUX WORKSHOP
, 2000
"... In this paper, we describe an implementation of a software Distributed Shared Memory (DSM) over Virtual Interface Architecture (VIA) for a Linux-based cluster of PCs and evaluate its performance. VIA is a user-level memory-mapped communication model that provides zero-copy communication and low-over ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
In this paper, we describe an implementation of a software Distributed Shared Memory (DSM) over Virtual Interface Architecture (VIA) for a Linux-based cluster of PCs and evaluate its performance. VIA is a user-level memory-mapped communication model that provides zero-copy communication and low-overhead by excluding the operating system kernel from the communication path. To our best knowledge, our implementation is the rst software DSM protocol on VIA. The DSM protocol we have implemented on VIA is Home-based Lazy Release Consistency (HLRC) that previous studies have shown to exhibit good scalability by reducing the number of messages and memory overhead compared to the homeless counterpart. The experimental results obtained on seven Splash-2 applications show that VIA can be successfully used to support software shared memory on clusters of PCs. The paper is accompanied by a source-code distribution of the software DSM protocol for Linux/VIA clusters.

