Results 1 - 10
of
26
IX: A Protected Dataplane Operating System for High Throughput and Low Latency
- In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14
"... The conventional wisdom is that aggressive networking requirements, such as high packet rates for small mes-sages and microsecond-scale tail latency, are best ad-dressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O performan ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
The conventional wisdom is that aggressive networking requirements, such as high packet rates for small mes-sages and microsecond-scale tail latency, are best ad-dressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O performance, while maintaining the key advantage of strong protection offered by existing ker-nels. IX uses hardware virtualization to separate man-agement and scheduling functions of the kernel (control plane) from network processing (dataplane). The data-plane architecture builds upon a native, zero-copy API and optimizes for both bandwidth and latency by dedi-cating hardware threads and networking queues to data-plane instances, processing bounded batches of packets to completion, and by eliminating coherence traffic and multi-core synchronization. We demonstrate that IX out-performs Linux and state-of-the-art, user-space network stacks significantly in both throughput and end-to-end la-tency. Moreover, IX improves the throughput of a widely deployed, key-value store by up to 3.6 × and reduces tail latency by more than 2×. 1
Queues don’t matter when you can JUMP them!
"... QJUMP is a simple and immediately deployable ap-proach to controlling network interference in datacenter networks. Network interference occurs when congestion from throughput-intensive applications causes queueing that delays traffic from latency-sensitive applications. To mitigate network interfere ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
QJUMP is a simple and immediately deployable ap-proach to controlling network interference in datacenter networks. Network interference occurs when congestion from throughput-intensive applications causes queueing that delays traffic from latency-sensitive applications. To mitigate network interference, QJUMP applies Inter-net QoS-inspired techniques to datacenter applications. Each application is assigned to a latency sensitivity level (or class). Packets from higher levels are rate-limited in the end host, but once allowed into the network can “jump-the-queue ” over packets from lower levels. In set-tings with known node counts and link speeds, QJUMP can support service levels ranging from strictly bounded latency (but with low rate) through to line-rate through-put (but with high latency variance). We have implemented QJUMP as a Linux Traffic Con-trol module. We show that QJUMP achieves bounded latency and reduces in-network interference by up to
NBA (Network Balancing Act): A High-performance Packet Processing Framework for Heterogeneous Processors
"... We present the NBA framework, which extends the ar-chitecture of the Click modular router to exploit mod-ern hardware, adapts to different hardware configurations, and reaches close to their maximum performance with-out manual optimization. NBA takes advantages of exist-ing performance-excavating so ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We present the NBA framework, which extends the ar-chitecture of the Click modular router to exploit mod-ern hardware, adapts to different hardware configurations, and reaches close to their maximum performance with-out manual optimization. NBA takes advantages of exist-ing performance-excavating solutions such as batch pro-cessing, NUMA-aware memory management, and receive-side scaling with multi-queue network cards. Its abstraction resembles Click but also hides the details of architecture-specific optimization, batch processing that handles the path diversity of individual packets, CPU/GPU load balancing, and complex hardware resource mappings due to multi-core CPUs and multi-queue network cards. We have implemented four sample applications: an IPv4 and an IPv6 router, an IPsec encryption gateway, and an intrusion detection system (IDS) with Aho-Corasik and regular expression matching. The IPv4/IPv6 router performance reaches the line rate on a commodity 80 Gbps machine, and the performances of the IPsec gateway and the IDS reaches above 30 Gbps. We also show that our adaptive CPU/GPU load balancer reaches near-optimal throughput in various combinations of sample applications and traffic conditions. 1
Fast Userspace Packet Processing
"... In recent years, we have witnessed the emergence of high speed packet I/O frameworks, bringing unprecedented net-work performance to userspace. Using the Click modular router, we first review and quantitatively compare several such packet I/O frameworks, showing their superiority to kernel-based for ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In recent years, we have witnessed the emergence of high speed packet I/O frameworks, bringing unprecedented net-work performance to userspace. Using the Click modular router, we first review and quantitatively compare several such packet I/O frameworks, showing their superiority to kernel-based forwarding. We then reconsider the issue of software packet processing, in the context of modern commodity hardware with hard-ware multi-queues, multi-core processors and non-uniform memory access. Through a combination of existing tech-niques and improvements of our own, we derive modern gen-eral principles for the design of software packet processors. Our implementation of a fast packet processor framework, integrating a faster Click with both Netmap and DPDK, ex-hibits up-to about 2.3x speed-up compared to other software implementations, when used as an IP router. 1.
Towards High-Performance Application-Level Storage Management
"... We propose a radical re-architecture of the traditional operating system storage stack to move the kernel off the data path. Leveraging virtualized I/O hardware for disk and flash storage, most read and write I/O operations go directly to application code. The kernel dynamically allocates extents, m ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We propose a radical re-architecture of the traditional operating system storage stack to move the kernel off the data path. Leveraging virtualized I/O hardware for disk and flash storage, most read and write I/O operations go directly to application code. The kernel dynamically allocates extents, manages the virtual to physical binding, and performs name translation. The benefit is to dramatically reduce the CPU overhead of storage operations while improving application flexibility. 1
sRoute: Treating the Storage Stack Like a Network
"... Abstract In a data center, an IO from an application to distributed storage traverses not only the network, but also several software stages with diverse functionality. This set of ordered stages is known as the storage or IO stack. Stages include caches, hypervisors, IO schedulers, file systems, a ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract In a data center, an IO from an application to distributed storage traverses not only the network, but also several software stages with diverse functionality. This set of ordered stages is known as the storage or IO stack. Stages include caches, hypervisors, IO schedulers, file systems, and device drivers. Indeed, in a typical data center, the number of these stages is often larger than the number of network hops to the destination. Yet, while packet routing is fundamental to networks, no notion of IO routing exists on the storage stack. The path of an IO to an endpoint is predetermined and hard-coded. This forces IO with different needs (e.g., requiring different caching or replica selection) to flow through a one-size-fits-all IO stack structure, resulting in an ossified IO stack. This paper proposes sRoute, an architecture that provides a routing abstraction for the storage stack. sRoute comprises a centralized control plane and "sSwitches" on the data plane. The control plane sets the forwarding rules in each sSwitch to route IO requests at runtime based on application-specific policies. A key strength of our architecture is that it works with unmodified applications and VMs. This paper shows significant benefits of customized IO routing to data center tenants (e.g., a factor of ten for tail IO latency, more than 60% better throughput for a customized replication protocol and a factor of two in throughput for customized caching).
Unblinding the OS to Optimize User-Perceived Flash SSD Latency
"... Abstract In this paper, we present a flash solid-state drive (SSD) optimization that provides hints of SSD internal behaviors, such as device I/O time and buffer activities, to the OS in order to mitigate the impact of I/O completion scheduling delays. The hints enable the OS to make reliable lat ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract In this paper, we present a flash solid-state drive (SSD) optimization that provides hints of SSD internal behaviors, such as device I/O time and buffer activities, to the OS in order to mitigate the impact of I/O completion scheduling delays. The hints enable the OS to make reliable latency predictions of each I/O request so that the OS can make accurate scheduling decisions when to yield or block (busy wait) the CPU, ultimately improving user-perceived I/O performance. This was achieved by implementing latency predictors supported with an SSD I/O behavior tracker within the SSD that tracks I/O behavior at the level of internal resources, such as DRAM buffers or NAND chips. Evaluations with an SSD prototype based on a Xilinx Zynq-7000 FPGA and MLC flash chips showed that our optimizations enabled the OS to mask the scheduling delays without severely impacting system parallelism compared to prior I/O completion methods.
Open access to the Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System fo
"... Abstract The conventional wisdom is that aggressive networking requirements, such as high packet rates for small messages and microsecond-scale tail latency, are best addressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O p ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract The conventional wisdom is that aggressive networking requirements, such as high packet rates for small messages and microsecond-scale tail latency, are best addressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O performance, while maintaining the key advantage of strong protection offered by existing kernels. IX uses hardware virtualization to separate management and scheduling functions of the kernel (control plane) from network processing (dataplane). The dataplane architecture builds upon a native, zero-copy API and optimizes for both bandwidth and latency by dedicating hardware threads and networking queues to dataplane instances, processing bounded batches of packets to completion, and by eliminating coherence traffic and multi-core synchronization. We demonstrate that IX outperforms Linux and state-of-the-art, user-space network stacks significantly in both throughput and end-to-end latency. Moreover, IX improves the throughput of a widely deployed, key-value store by up to 3.6× and reduces tail latency by more than 2×.
StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs
"... Abstract StackMap leverages the best aspects of kernel-bypass networking into a new low-latency Linux network service based on the full-featured TCP kernel implementation, by dedicating network interfaces to applications and offering an extended version of the netmap API as a zero-copy, lowoverhead ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract StackMap leverages the best aspects of kernel-bypass networking into a new low-latency Linux network service based on the full-featured TCP kernel implementation, by dedicating network interfaces to applications and offering an extended version of the netmap API as a zero-copy, lowoverhead data path while retaining the socket API for the control path. For small-message, transactional workloads, StackMap outperforms baseline Linux by 4 to 80 % in latency and 4 to 391 % in throughput. It also achieves comparable performance with Seastar, a highly-optimized user-level TCP/IP stack for DPDK.