Results 1 - 10
of
20
ServerSwitch: A Programmable and High Performance Platform for Data Center Networks
, 2011
"... As one of the fundamental infrastructures for cloud computing, data center networks (DCN) have recently been studied extensively. We currently use pure software-based systems, FPGA based platforms, e.g., NetFPGA, or OpenFlow switches, to implement and evaluate various DCN designs including topology ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
As one of the fundamental infrastructures for cloud computing, data center networks (DCN) have recently been studied extensively. We currently use pure software-based systems, FPGA based platforms, e.g., NetFPGA, or OpenFlow switches, to implement and evaluate various DCN designs including topology design, control plane and routing, and congestion control. However, software-based approaches suffer from high CPU overhead and processing latency; FPGA based platforms are difficult to program and incur high cost; and OpenFlow focuses on control plane functions at present. In this paper, we design a ServerSwitch to address the above problems. ServerSwitch is motivated by the observation that commodity Ethernet switching chips are becoming programmable and that the PCI-E interface provides high throughput and low latency between the server CPU and I/O subsystem. ServerSwitch uses a commodity switching chip for various customized packet forwarding, and leverages the server CPU for control and data plane packet processing, due to the low latency and high throughput between the switching chip and server CPU. We have built our ServerSwitch at low cost. Our experiments demonstrate that ServerSwitch is fully programmable and achieves high performance. Specifically, we have implemented various forwarding schemes including source routing in hardware. Our in-network caching experiment showed high throughput and flexible data processing. Our QCN (Quantized Congestion Notification) implementation further demonstrated that ServerSwitch can react to network congestions in 23us. ∗ This work was performed when Zhiqiang Zhou was a visiting student at Microsoft Research Asia. 1
Sslshader: cheap ssl acceleration with commodity processors
- In Proceedings of the 8th USENIX conference on Networked systems and implementation, NSDI’11
, 2011
"... Secure end-to-end communication is becoming increasingly important as more private and sensitive data is transferred on the Internet. Unfortunately, today’s SSL deployment is largely limited to security or privacycritical domains. The low adoption rate is mainly attributed to the heavy cryptographic ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Secure end-to-end communication is becoming increasingly important as more private and sensitive data is transferred on the Internet. Unfortunately, today’s SSL deployment is largely limited to security or privacycritical domains. The low adoption rate is mainly attributed to the heavy cryptographic computation overhead on the server side, and the cost of good privacy on the Internet is tightly bound to expensive hardware SSL accelerators in practice. In this paper we present high-performance SSL acceleration using commodity processors. First, we show that modern graphics processing units (GPUs) can be easily converted to general-purpose SSL accelerators. By exploiting the massive computing parallelism of GPUs, we accelerate SSL cryptographic operations beyond what state-of-the-art CPUs provide. Second, we build a transparent SSL proxy, SSLShader, that carefully leverages the trade-offs of recent hardware features such as AES-NI and NUMA and achieves both high throughput and low latency. In our evaluation, the GPU implementation of RSA shows a factor of 22.6 to 31.7 improvement over the fastest CPU implementation. SSLShader achieves 29K transactions per second for small files while it transfers large files at 13 Gbps on a commodity server machine. These numbers are comparable to high-end commercial SSL appliances at a fraction of their price.
Building extensible networks with rule-based forwarding
- In OSDI
, 2010
"... We present a network design that provides flexible and policy-compliant forwarding. Our proposal centers around a new architectural concept: that of packet rules. A rule is a simple if-then-else construct that describes the manner in which the network should – or should not – forward packets. A pack ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present a network design that provides flexible and policy-compliant forwarding. Our proposal centers around a new architectural concept: that of packet rules. A rule is a simple if-then-else construct that describes the manner in which the network should – or should not – forward packets. A packet identifies the rule by which it is to be forwarded and routers forward each packet in accordance with its associated rule. Each packet rule is certified, guaranteeing that all parties involved in forwarding a packet agree with the packet’s rule. Packets containing uncertified rules are simply dropped in the network. We present the design, implementation and evaluation of a Rule-Based Forwarding (RBF) architecture. We demonstrate flexibility by illustrating how RBF supports a variety of use cases including content caching, middlebox selection and DDoS protection. Using our prototype router implementation we show that the overhead RBF imposes is within the capabilities of modern network equipment. 1
Building a single-box 100 gbps software router
- In IEEE Workshop on Local and Metropolitan Area Networks
, 2010
"... great leaps in terms of CPU, memory, and I/O bus speeds. Benefiting from the hardware innovation, recent software routers on commodity PC now report about 10 Gbps in packet routing. In this paper we map out expected hurdles and projected speed-ups to reach 100 Gbps in packet routing on a single comm ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
great leaps in terms of CPU, memory, and I/O bus speeds. Benefiting from the hardware innovation, recent software routers on commodity PC now report about 10 Gbps in packet routing. In this paper we map out expected hurdles and projected speed-ups to reach 100 Gbps in packet routing on a single commodity PC. With careful measurements, we identify two notable bottlenecks for our goal: CPU cycles and I/O bandwidth. For the former, we propose reducing per-packet processing overhead with softwarelevel optimizations and buying extra computing power with GPUs. To improve the I/O bandwidth, we suggest scaling the performance of I/O hubs that limits packet routing speed to well before 50 Gbps. I.
PTask: Operating system abstractions to manage gpus as compute devices
- Carnegie Mellon University
, 2011
"... We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These new abstractions, collectively called the PTask API, support a dataflow programming model. Because a PTask graph consists of OS-managed objects, the kernel has sufficient v ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These new abstractions, collectively called the PTask API, support a dataflow programming model. Because a PTask graph consists of OS-managed objects, the kernel has sufficient visibility and control to provide system-wide guarantees like fairness and performance isolation, and can streamline data movement in ways that are impossible under current GPU programming models. Our experience developing the PTask API, along with a gestural interface on Windows 7 and a FUSEbased encrypted file system on Linux show that the PTask API can provide important system-wide guarantees where there were previously none, and can enable significant performance improvements, for example gaining a 5 × improvement in maximum throughput for the gestural interface. Categories and Subject Descriptors D.4.8 [Operating systems]: [Performance]; D.4.7 [Operating systems]: [Organization and Design];
A Cost Comparison of Data Center Network Architectures
"... There is a growing body of research exploring new network architectures for the data center. These proposals all seek to improve the scalability and cost-effectiveness of current data center networks, but adopt very different approaches to doing so. For example, some proposals build networks entirel ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
There is a growing body of research exploring new network architectures for the data center. These proposals all seek to improve the scalability and cost-effectiveness of current data center networks, but adopt very different approaches to doing so. For example, some proposals build networks entirely out of switches while others do so using a combination of switches and servers. How do these different network architectures compare? For that matter, by what metrics should we even begin to compare these architectures? Understanding the tradeoffs between different approaches is important both for operators making deployment decisions and to guide future research. In this paper, we take a first step toward understanding the tradeoffs between different data center network architectures. We use high-level models of different classes of data center networks and compare them on cost using both current and predicted trends in cost and power consumption. 1.
Toward Predictable Performance in Software Packet-Processing Platforms
"... To become a credible alternative to specialized hardware, general-purpose networking needs to offer not only flexibility, but also predictable performance. Recent projects have demonstrated that general-purpose multicore hardware is capable of high-performance packet processing, but under a crucial ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
To become a credible alternative to specialized hardware, general-purpose networking needs to offer not only flexibility, but also predictable performance. Recent projects have demonstrated that general-purpose multicore hardware is capable of high-performance packet processing, but under a crucial simplifying assumption of uniformity: all processing cores see the same type/amount of traffic and run identical code, while all packets incur the same type of conventional processing (e.g., IP forwarding). Instead, we present a general-purpose packetprocessing system that combines ease of programmability with predictable performance, while running a diverse set of applications and serving multiple clients with different needs. Offering predictability in this context is considered a hard problem because software processes contend for shared hardware resources—caches, memory controllers, buses—in unpredictable ways. Still, we show that, in our system, (a) the way in which resource contention affects performance is predictable and (b) the overall performance depends little on how different processes are scheduled on different cores. To the best of our knowledge, our results constitute the first evidence that, when designing software network equipment, flexibility and predictability are not mutually exclusive goals. 1
Small Cache, Big Effect: Provable Load Balancing for Randomly Partitioned Cluster Services
"... Load balancing requests across a cluster of back-end servers is critical for avoiding performance bottlenecks and meeting servicelevel objectives (SLOs) in large-scale cloud computing services. This paper shows how a small, fast popularity-based front-end cache can ensure load balancing for an impor ..."
Abstract
- Add to MetaCart
Load balancing requests across a cluster of back-end servers is critical for avoiding performance bottlenecks and meeting servicelevel objectives (SLOs) in large-scale cloud computing services. This paper shows how a small, fast popularity-based front-end cache can ensure load balancing for an important class of such services; furthermore, we prove an O(n log n) lower-bound on the necessary cache size and show that this size depends only on the total number of back-end nodes n, not the number of items stored in the system. We validate our analysis through simulation and empirical results running a key-value storage system on an 85-node cluster.
XIA: Efficient Support for Evolvable Internetworking
"... Motivated by limitations in today’s host-centric IP network, recent studies have proposed clean-slate network architectures centered around alternate first-class principals, such as content, services, or users. However, much like the host-centric IP design, elevating one principal type above others ..."
Abstract
- Add to MetaCart
Motivated by limitations in today’s host-centric IP network, recent studies have proposed clean-slate network architectures centered around alternate first-class principals, such as content, services, or users. However, much like the host-centric IP design, elevating one principal type above others hinders communication between other principals and inhibits the network’s capability to evolve. This paper presents the eXpressive Internet Architecture (XIA), an architecture with native support for multiple principals and the ability to evolve its functionality to accommodate new, as yet unforeseen, principals over time. We describe key design requirements, and demonstrate how XIA’s rich addressing and forwarding semantics facilitate flexibility and evolvability, while keeping core network functions simple and efficient. We describe case studies that demonstrate key functionality XIA enables. 1
Offset Addressing Approach to Memory-Efficient IP Address Lookup
"... Abstract—This paper presents a novel offset encoding scheme for memory-efficient IP address lookup, called Offset Encoded Trie (OET). Each node in the OET contains only a next hop bitmap and an offset value, without the child pointers and the next hop pointers. Each traversal node uses the next hop ..."
Abstract
- Add to MetaCart
Abstract—This paper presents a novel offset encoding scheme for memory-efficient IP address lookup, called Offset Encoded Trie (OET). Each node in the OET contains only a next hop bitmap and an offset value, without the child pointers and the next hop pointers. Each traversal node uses the next hop bitmap and the offset value as two offsets to determine the location address of the next node to be searched. The on-chip OET is searched to find the longest matching prefix, and then the prefix is used as a key to retrieve the corresponding next hop from an off-chip prefix hash table. Experiments on real IP forwarding tables show that the OET outperforms previous multi-bit trie schemes in terms of the memory consumption. The OET facilitates the far more effective use of on-chip memory for faster IP address lookup. I.

