Results 1 - 10
of
10
Eliminating receive livelock in an interrupt-driven kernel
- ACM Transactions on Computer Systems
, 1997
"... Most operating systems use interface interrupts to schedule network tasks. Interrupt-driven systems can provide low overhead and good latency at low of-fered load, but degrade significantly at higher arrival rates unless care is taken to prevent several pathologies. These are various forms of receiv ..."
Abstract
-
Cited by 241 (4 self)
- Add to MetaCart
Most operating systems use interface interrupts to schedule network tasks. Interrupt-driven systems can provide low overhead and good latency at low of-fered load, but degrade significantly at higher arrival rates unless care is taken to prevent several pathologies. These are various forms of receive livelock, in which the system spends all its time processing interrupts, to the exclusion of other neces-sary tasks. Under extreme conditions, no packets are delivered to the user application or the output of the system. To avoid livelock and related problems, an operat-ing system must schedule network interrupt handling as carefully as it schedules process execution. We modified an interrupt-driven networking implemen-tation to do so; this eliminates receive livelock without degrading other aspects of system performance. We present measurements demonstrating the success of our approach. 1.
Performance issues in parallelized network protocols
- In First USENIX Symposium on Operating Systems Design and Implementation
, 1994
"... Parallel processing has been proposed as a means of improving network protocol throughput. Several different strategies have been taken towards parallelizing protocols. A relatively popular approach is packet-level parallelism, where packets are distributed across processors. This paper provides an ..."
Abstract
-
Cited by 50 (11 self)
- Add to MetaCart
Parallel processing has been proposed as a means of improving network protocol throughput. Several different strategies have been taken towards parallelizing protocols. A relatively popular approach is packet-level parallelism, where packets are distributed across processors. This paper provides an experimental performance study of packet-level parallelism on a contemporary sharedmemory multiprocessor. We examine several unexplored areas in packet-level parallelism and investigate how various protocol structuring and implementation techniques can affect performance. We study TCP/IP and UDP/IP protocol stacks, implemented with a parallel version of the x-kernel running in user space on Silicon Graphics multiprocessors. Our results show that only limited packet-level parallelism can be achieved within a single connection under TCP, but that using multiple connections can improve available parallelism. We also demonstrate that packet ordering plays a key role in determining single-connection TCP performance, that careful use of locks is a necessity, and that selective exploitation of caching can improve throughput. We also describe experiments that compare parallel protocol performance on two generations of a parallel machine and show how computer architectural trends can influence performance. 1
An efficient zero-copy I/O framework for UNIX
, 1995
"... Traditional UNIX ® I/O interfaces are based on copy semantics, where read and write calls transfer data between the kernel and user-defined buffers. Although simple, copy semantics limit the ability of the operating system to efficiently implement data transfer operations. In this paper, we present ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Traditional UNIX ® I/O interfaces are based on copy semantics, where read and write calls transfer data between the kernel and user-defined buffers. Although simple, copy semantics limit the ability of the operating system to efficiently implement data transfer operations. In this paper, we present extensions on the traditional UNIX interfaces that are based on explicit buffer exchange. Instead of transferring data between user-defined buffers and the kernel, the new extensions transfer data buffers between the user and the kernel. We study using the new interfaces in typical application programs, and compare their use to the standard UNIX interfaces. The new interfaces lend themselves to an efficient zero-copy data transfer implementation. We describe such an implementation in this paper, and we examine its performance. The implementation, done in the context of the Solaris TM operating system, is very efficient: for example, on a typical file transfer benchmark, the network throughput was improved by more than 40 % and the CPU utilization reduced by more than 20%.
Software Support for Outboard Buffering and Checksumming
- In Proceedings of the ACM SIGCOMM ’95 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
, 1995
"... Data copying and checksumming are the most expensive operations when doing high-bandwidth network IO over a highspeed network. Under some conditions, outboard buffering and checksumming can eliminate accesses to the data, thus making communication less expensive and faster. One of the scenarios in w ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
Data copying and checksumming are the most expensive operations when doing high-bandwidth network IO over a highspeed network. Under some conditions, outboard buffering and checksumming can eliminate accesses to the data, thus making communication less expensive and faster. One of the scenarios in which outboard buffering pays off is the common case of applications accessing the network using the Berkeley sockets interface and the Internet protocol stack. In this paper we describe the changes that were made to a BSD protocol stack to make use of a network adaptor that supports outboard buffering and checksumming. Our goal is not only to achieve "single copy" communication for application that use sockets, but to also have efficient communication for in-kernel applications and for applications using other networks. Performance measurements show that for large reads and writes the single-copy path through the stack is significantly more efficient than the original implementation. 1 Intr...
Network-Based Multicomputers: An Emerging Parallel Architecture
- in Proceedings of Supercomputing'91
, 1991
"... Multicomputers built around a general network are now a viable alternative to multicomputersbased ona system-specific interconnect because of architectural improvements in two areas. First, the host-network interface overhead can be minimized by reducing copy operations and host interrupts. Second, ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Multicomputers built around a general network are now a viable alternative to multicomputersbased ona system-specific interconnect because of architectural improvements in two areas. First, the host-network interface overhead can be minimized by reducing copy operations and host interrupts. Second, the network can provide high bandwidth and low latency by using high-speed crossbar switches and efficient protocol implementations. While still enjoying the flexibility of general networks, the resulting network-based multicomputers achieve high performance for typical multicomputer applications that use system-specific interconnects. We have developed a network-based multicomputer called Nectar that supports these claims. 1 Introduction Current commercial parallel machines cover a wide spectrum of architectures: shared-memory parallel computers such as the Alliant, Encore, Sequent, and CRAY Y-MP; and distributed-memory computers including MIMD machines such as the Transputer [15], iWarp ...
Design, Implementation, and Evaluation of a Single-Copy Protocol Stack
- Software - Practice and Experience
, 1998
"... Data copying and checksumming are the most expensive operations on hosts performing high-bandwidth network I/O over a high-speed network. Under some conditions, outboard buffering and checksumming can eliminate accesses to the data, thus making communication less expensive and faster. One of the ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Data copying and checksumming are the most expensive operations on hosts performing high-bandwidth network I/O over a high-speed network. Under some conditions, outboard buffering and checksumming can eliminate accesses to the data, thus making communication less expensive and faster. One of the scenarios in which outboard buffering and checksumming pays off is the common case of applications accessing the network using the Berkeley sockets interface and the Internet protocol stack. In this paper we describe the host software for a host interface with outboard buffering and checksumming support. The platform used is DEC Alpha workstations with a Turbochannel I/O bus and running the DEC OSF/1 operating system. Our implementation does not only achieve "single copy" communication for applications that use sockets, but it also interoperates efficiently with in-kernel applications and other network devices. Measurements show that for large reads and writes the single-copy path thr...
TCP/IP on Gigabit Networks
"... This paper presents new algorithms that were implemented in Berkeley TCP to lessen the frequency of "congestion collapse" on the Internet. So successful were these algorithms, that all Internet host TCP implementations are required to use them [4]. Indeed, these algorithms have been given credit for ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents new algorithms that were implemented in Berkeley TCP to lessen the frequency of "congestion collapse" on the Internet. So successful were these algorithms, that all Internet host TCP implementations are required to use them [4]. Indeed, these algorithms have been given credit for saving the Internet from permanent congestion collapse.
The Ethernet supported large 100-node networks in 1976.
"... go beyond simple benchmark scenarios where line rate communication connects a phony bit source to a phony bit sink, with the CPU saturated. The context for the work was to connect two applications at high speed, leaving CPU DART: Fast Applicationlevel Networking via Data-copy Avoidance Robert J. ..."
Abstract
- Add to MetaCart
go beyond simple benchmark scenarios where line rate communication connects a phony bit source to a phony bit sink, with the CPU saturated. The context for the work was to connect two applications at high speed, leaving CPU DART: Fast Applicationlevel Networking via Data-copy Avoidance Robert J. Walsh The goal of DART is to effectively deliver highbandwidth performance to the application, without a change to the operating system call semantics. The DART project was started soon after the first DART switch was completed, and also soon after line-rate communication over DART was achieved. In looking forward to gigabit class networks as the next hurdle to conquer, we foresaw a need for an integrated hardwaresoftware project that addressed fundamental memory bandwidth bottleneck issues through a system-level perspective. 1997 IEEE. Reprinted, w

