Results 1 - 10
of
62
U-Net: A User-Level Network Interface for Parallel and Distributed Computing
- In Fifteenth ACM Symposium on Operating System Principles
, 1995
"... The U-Net communication architecture provides processes with a virtual view of a network interface to enable userlevel access to high-speed communication devices. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communi ..."
Abstract
-
Cited by 518 (14 self)
- Add to MetaCart
The U-Net communication architecture provides processes with a virtual view of a network interface to enable userlevel access to high-speed communication devices. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communication path, while still providing full protection. The model presented by U-Net allows for the construction of protocols at user level whose performance is only limited by the capabilities of network. The architecture is extremely flexible in the sense that traditional protocols like TCP and UDP, as well as novel abstractions like Active Messages can be implemented efficiently. A U-Net prototype on an 8-node ATM cluster of standard workstations offers 65 microseconds round-trip latency and 15 Mbytes/sec bandwidth. It achieves TCP performance at maximum network bandwidth and demonstrates performance equivalent to Meiko CS-2 and TMC CM-5 supercomputers on a set of Split-C benchmarks. 1
High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet
- In Supercomputing
, 1995
"... In most computer systems, software overhead dominates the cost of messaging, reducing delivered performance, especially for short messages. Efficient software messaging layers are needed to deliver the hardware performance to the application level and to support tightly-coupled workstation clusters. ..."
Abstract
-
Cited by 280 (16 self)
- Add to MetaCart
In most computer systems, software overhead dominates the cost of messaging, reducing delivered performance, especially for short messages. Efficient software messaging layers are needed to deliver the hardware performance to the application level and to support tightly-coupled workstation clusters. Illinois Fast Messages (FM) 1.0 is a high speed messaging layer that delivers low latency and high bandwidth for short messages. For 128-byte packets, FM achieves bandwidths of 16.2 MB/s and one-way latencies 32 s on Myrinet-connected SPARCstations (user-level to user-level). For shorter packets, we have measured one-way latencies of 25 s, and for larger packets, bandwidth as high as to 19.6 MB/s — delivered bandwidth greater than OC-3. FM is also superior to the Myrinet API messaging layer, not just in terms of latency and usable bandwidth, but also in terms of the message half-power point (n 1 2 which is two orders of magnitude smaller (54 vs. 4,409 bytes). We describe the FM messaging primitives and the critical design issues in building a low-latency messaging layers for workstation clusters. Several issues are critical: the division of labor between host and network coprocessor, management of the input/output (I/O) bus, and buffer management. To achieve high performance, messaging layers should assign as much functionality as possible to the host. If the network interface has DMA capability, the I/O bus should be used asymmetrically, with
MPI-FM: High Performance MPI on Workstation Clusters
- Journal of Parallel and Distributed Computing
, 1997
"... Despite the emergence of high speed LANs, the communication performance available to applications on workstation clusters still falls short of that available on MPPs. A new generation of efficient messaging layers is needed to take advantage of the hardware performance and to deliver it to the appli ..."
Abstract
-
Cited by 71 (13 self)
- Add to MetaCart
Despite the emergence of high speed LANs, the communication performance available to applications on workstation clusters still falls short of that available on MPPs. A new generation of efficient messaging layers is needed to take advantage of the hardware performance and to deliver it to the application level. Communication software is the key element in bridging the communication performance gap separating MPPs and workstation clusters. MPI-FM is a high performance implementation of MPI for networks of workstations connected with a Myrinet network, built on top of the Fast Messages (FM) library. Based on the FM version 1.1 released in Fall 1995, MPI-FM achieves a minimum oneway latency of 19 ¯s and a peak bandwidth of 17.3 MByte/s with common MPI send and receive function calls. A direct comparison using published performance figures shows that MPI-FM running on SPARCstation 20 workstations connected with a relatively inexpensive Myrinet network outperforms the MPI implementations a...
Low-Latency Communication over ATM Networks using Active Messages
- IEEE Micro
, 1995
"... Recent developments in communication architectures for parallel machines have made significant progress and reduced the communication overheads and latencies by over an order of magnitude as compared to earlier proposals. This paper examines whether these techniques can carry over to clusters of wor ..."
Abstract
-
Cited by 68 (0 self)
- Add to MetaCart
Recent developments in communication architectures for parallel machines have made significant progress and reduced the communication overheads and latencies by over an order of magnitude as compared to earlier proposals. This paper examines whether these techniques can carry over to clusters of workstations connected by an ATM network even though clusters use standard operating system software, are equipped with network interfaces optimized for stream communication, do not allow direct protected user-level access to the network, and use networks without reliable transmission or flow control. In a first part, this paper describes the differences in communication characteristics between clusters of workstations built from standard hardware and software components and state-of-the-art multiprocessors. The lack of flow control and of operating system coordination affects the communication layer design significantly and requires larger buffers at each end than on multiprocessors. A second ...
Efficient collective communication on heterogeneous networks of workstations
- In International Conference on Parallel Processing
, 1998
"... banikaze,moorthy,panda¢ ..."
Efficient Layering for High Speed Communication: Fast Message 2.x
- In Proceedings of the 7th High Performance Distributed Computing (HPDC7
, 1998
"... permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions /
Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM Switches and Bus-Based Multiprocessor Servers
- in The 2nd IEEE Symposium on High-Performance Computer Architecture
, 1996
"... We consider a network of workstations (NOW) organization consisting of a number of bus-based multiprocessor servers interconnected by an ATM switch. A shared-memory model is supported by distributed virtual shared memory (DVSM) and this paper focuses on the access penalties incurred by (1) ATM ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
We consider a network of workstations (NOW) organization consisting of a number of bus-based multiprocessor servers interconnected by an ATM switch. A shared-memory model is supported by distributed virtual shared memory (DVSM) and this paper focuses on the access penalties incurred by (1) ATM and (2) the DVSM software. First, through detailed architectural simulations we find that while the bandwidth and the latency of the ATM switch fabrics are found to be acceptable, the latency incurred by commercially available ATM interfaces has a first order effect on the performance. We also study the effects of various scheduling policies for the coherence handlers. Our data suggest that since the probability of finding an idle processor within a cluster is high, a good policy is to schedule it there instead of letting an extra compute processor execute coherence handlers. Overall, by adjusting the adaptation layer of ATM to a DVSM system we find that ATM is a promising technology for these kinds of systems. 1
Multicast on Irregular Switch-based Networks with Wormhole Routing
- IN PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-3
, 1997
"... This paper presents efficient multicasting with reduced contention on irregular networks with switchbased wormhole interconnection and unicast message passing. First, it is proved that for an arbitrary irregular network with a typical deadlock-free, adaptive routing, it may not be possible to create ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
This paper presents efficient multicasting with reduced contention on irregular networks with switchbased wormhole interconnection and unicast message passing. First, it is proved that for an arbitrary irregular network with a typical deadlock-free, adaptive routing, it may not be possible to create an ordered list of nodes to implement an arbitrary multicast in a contention-free manner with minimal number of communication steps. Next, three different multicast algorithms are proposed with their respective node orderings to reduce contention: switchbased ordering (SO), switch-based hierarchical ordering (SHO), and chain concatenation ordering (CCO). A variation of a binomial tree-based communication pattern with unicast message passing is used on the above ordered lists to implement multicast. The proposed multicast algorithms are compared with each other as well as with the naive random ordering (RO) algorithm for a range of system sizes, switch sizes, message lengths, degrees of co...
Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations
- In HCW’99, the 8th Heterogeneous Computing Workshop
, 1999
"... Abstract: Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due to the multiplicity of vendors and platforms, the NOW environments are being gradually redefined as Heterogene ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Abstract: Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due to the multiplicity of vendors and platforms, the NOW environments are being gradually redefined as Heterogeneous Networks of Workstations (HNOW). Having an accurate model for the communication in HNOW systems is crucial for design and evaluation of efficient communication layers for such systems. In this paper we present a model for point-to-point communication in HNOW systems and show how it can be used for characterizing the performance of different collective communication operations. In particular, we show how the performance of broadcast, scatter, and gather operations can be modeled and analyzed. We also verify the accuracy of our proposed model by using an experimental HNOW testbed. Furthermore, it is shown how this model can be used for comparing the performance of different collective communication algorithms. We also show how the effect of heterogeneity on the performance of collective communication operations can be predicted. 1
Issues in ATM Support of High Performance Geographically Distributed Computing
- In First International Workshop on High-Speed Network Computing
, 1995
"... This paper provides an experimental assessment of the impact of the underlying networking in a cluster-based computing environment. The assessment is quantified through application level benchmarking, process level communication, and network file I/O. Two testbeds are considered - one small cluster ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
This paper provides an experimental assessment of the impact of the underlying networking in a cluster-based computing environment. The assessment is quantified through application level benchmarking, process level communication, and network file I/O. Two testbeds are considered - one small cluster of SUN workstations and another larger cluster composed of 32 high-end IBM RS/6000 platforms. The cluster machines have Ethernet, FDDI, Fiber Channel and ATM network interface cards installed. This provides a consistent testbed since the processors and operating system are identical for this suite of experiments. A major issue to be examined is ATMbased local area networks. In particular, a primary goal of this work is to assess the suitability of an ATM-based network to support interprocess communication and remote file I/O systems for distributed computing. 1 Introduction This report presents an experimental assessment of the impact of the underlying networking in a cluster-based computin...

