Results 11 - 20
of
56
Tsunami: A High-Speed Rate-Controlled Protocol for File Transfer,” http://steinbeck.ucs.indiana.edu/ mmeiss/papers/tsunami.pdf
"... We describe a reliable transfer protocol, Tsunami, designed for faster transfer of large files over high-speed networks than appears possible with standard implementations of TCP. Tsunami is an application-level protocol that features rate control via adjustment of inter-packet delay rather than a s ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We describe a reliable transfer protocol, Tsunami, designed for faster transfer of large files over high-speed networks than appears possible with standard implementations of TCP. Tsunami is an application-level protocol that features rate control via adjustment of inter-packet delay rather than a sliding-window mechanism. Data blocks are transferred via UDP and control data are transferred via TCP. We also discuss future steps in development of the protocol. 1
The Case Against User-level Networking
- In Third Workshop on Novel Uses of System Area Networks (SAN-3) (Held in conjunction with HPCA-10
, 2004
"... Abstract — Extensive research on system support for enabling I/O-intensive applications to achieve performance close to the limits imposed by the hardware suggests two main approaches: Low overhead I/O protocols and the flexibility to customize I/O policies to the needs of applications. One way to a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract — Extensive research on system support for enabling I/O-intensive applications to achieve performance close to the limits imposed by the hardware suggests two main approaches: Low overhead I/O protocols and the flexibility to customize I/O policies to the needs of applications. One way to achieve both is by supporting user-level access to I/O devices, enabling user-level implementations of I/O protocols. User-level networking is an example of this approach, specific to network interface controllers (NICs). In this paper, we argue that the real key to high-performance in I/O-intensive applications is user-level file caching and user-level network buffering, both of which can be achieved without user-level access to NICs. Avoiding the need to support user-level networking carries two important benefits for overall system design: First, a NIC exporting a privileged kernel interface is simpler to design and implement than one exporting a user-level interface. Second, the kernel is re-instated as a global system resource controller and arbitrator. We develop an analytical model of network storage applications and use it to show that their performance is not affected by the use of a kernel-based API to NICs. I.
Studying Network Protocol Offload With Emulation: Approach And Preliminary Results
- In Proceedings of the 12th Annual IEEE Symposium on High Performance Interconnects
, 2004
"... Abstract — To fully take advantage of high-speed networks while freeing CPU cycles for application processing, the industry is proposing new techniques relying on an extended role of the network interface card such as TCP Offload Engine and Remote Direct Memory Access. This paper presents an experim ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract — To fully take advantage of high-speed networks while freeing CPU cycles for application processing, the industry is proposing new techniques relying on an extended role of the network interface card such as TCP Offload Engine and Remote Direct Memory Access. This paper presents an experimental study aimed at collecting the performance data needed to assess these techniques. This work is based on the emulation of an advanced network interface card plugged on the I/O bus. In the experimental setting, a processor of a partitioned SMP machine is dedicated to network processing. Achieving a faithful emulation of a network interface card is one of the main concerns and it is guiding the design of the Offload Engine software. This setting has the advantage of being flexible so that many different offload scenarios can be evaluated. Preliminary throughput results of an emulated TCP Offload Engine demonstrate a large benefit. The emulated TCP Offload Engine indeed yields 600 to 900% improvement while still relying on memory copies at the kernel boundary. I.
Analyzing NIC Overheads in Network-Intensive Workloads
- In 8th Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW), Feb 2005
, 2004
"... Modern high-bandwidth networks place a significant strain on host I/O subsystems. However, despite the practical ubiquity of TCP/IP over Ethernet for high-speed networking, the vast majority of end-host networking research continues in the current paradigm of the network interface as a generic perip ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Modern high-bandwidth networks place a significant strain on host I/O subsystems. However, despite the practical ubiquity of TCP/IP over Ethernet for high-speed networking, the vast majority of end-host networking research continues in the current paradigm of the network interface as a generic peripheral device. As a result, proposed optimizations focus on purely software changes, or on moving some of the computation from the primary CPU to the off-chip network interface controller (NIC). We look at an alternative approach: leave the kernel TCP/IP stack unchanged, but eliminate bottlenecks by closer attachment of the NIC to the CPU and memory system.
TCP/IP cache characterization in commercial server workloads
- In Proc. Seventh Workshop on Computer Architecture Evaluation using Commercial Workloads
, 2004
"... Abstract – Internet server applications (such as web servers, ecommerce front-ends, etc) spend a significant portion of time processing network data. These applications use TCP/IP as the communication protocol which is known to be very memory intensive. In this paper, we present a simulation-based c ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract – Internet server applications (such as web servers, ecommerce front-ends, etc) spend a significant portion of time processing network data. These applications use TCP/IP as the communication protocol which is known to be very memory intensive. In this paper, we present a simulation-based characterization of the cache/memory access behavior of TCP/IP processing in two popular commercial benchmarks-- SPECweb99 and TPC-W. Our Simple Scalar simulator is fed with network traces collected from commercial platforms running these benchmarks under various configurations. We identify the types of data (descriptors, headers, control blocks, etc) that the TCP/IP stack needs to access while processing packets and analyze the cache behavior, in terms of cache size and locality for these data. We show that the TCP/IP data falls into two categories; data with temporal locality such as hash nodes and TCBs and data with no locality (transient) such as descriptors and payload. Based on the cache characterization study, we propose the usage of a dedicated cache to store and manage TCP/IP data with and without locality. We study various approaches to organizing the network cache and their effects. We show that a small cache, in the order of 5 Kbytes is sufficient for near-optimal performance of TCP/IP processing with the additional advantage of minimizing the processor cache pollution.. We also touch upon alternative approaches to enable network-friendly cache hierarchies without the need for a dedicated cache structure I.
A High Performance Configurable Transport Protocol for Grid Computing
"... Grid computing infrastructures and applications are increasingly diverse with networks ranging from very high bandwidth optical networks to wireless networks and applications ranging from remote visualization to sensor data collection. For such environments, standard transport protocols such as TCP ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Grid computing infrastructures and applications are increasingly diverse with networks ranging from very high bandwidth optical networks to wireless networks and applications ranging from remote visualization to sensor data collection. For such environments, standard transport protocols such as TCP and UDP are not always sufficient or optimal given their fixed set of properties and their lack of flexibility. As an alternative, we present H-CTP, a highperformance configurable transport protocol that can be used to build customized transport services for a wide range of Grid computing scenarios. H-CTP is based on an earlier configurable transport protocol called CTP, but with a collection of optimizations that meet the challenge of providing configurability while maintaining performance that meets the requirements of such demanding applications. This paper motivates the need for customizable transport in this area, presents the design of H-CTP, and gives results from performance studies that compare H-CTP with both CTP and TCP. These show, for example, that H-CTP is able to achieve throughput of over 900 Mbps across Gigabit links. Three diverse Grid scenarios are used as example applications and for the H-CTP/TCP comparisons: remote visualization, fast message passing, and sensor grids. 1.
The Performance Potential of an Integrated Network Interface
, 2004
"... High-bandwidth TCP/IP networking is a core component of current and future computer systems. Though networking is central to computing today, the vast majority of end-host networking research focuses on the current paradigm of the network interface being merely a peripheral device. Most optimization ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
High-bandwidth TCP/IP networking is a core component of current and future computer systems. Though networking is central to computing today, the vast majority of end-host networking research focuses on the current paradigm of the network interface being merely a peripheral device. Most optimizations focus solely on software changes or on moving some of the computation from the primary CPU to the off-chip network interface controller (NIC). We present an alternative approach for achieving high performance networking. Rather than increasing the complexity of the NIC, we directly integrate a conventional NIC on the CPU die.
Efficient remote block-level i/o over an rdma-capable nic
- In Proceedings, International Conference on Supercomputing (ICS 2006
"... Modern storage systems are required to scale to large storage capacities and I/O throughput in a cost effective manner. For this reason, they are increasingly being built out of commodity components, mainly PCs equipped with large numbers of disks and interconnected of high-performance system area n ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Modern storage systems are required to scale to large storage capacities and I/O throughput in a cost effective manner. For this reason, they are increasingly being built out of commodity components, mainly PCs equipped with large numbers of disks and interconnected of high-performance system area networks. A main issue in these efforts is to achieve high I/O throughput over commodity, low-cost system area networks and commodity operating systems. In this work, we examine in detail the performance of remote block-level storage I/O over commodity, RDMA-capable network interfaces and networks. We examine the support that is required from the network interface for achieving high throughput. We also examine in detail the overheads associated in kernel-level protocols for networked storage access. We find that base system performance is limited by (a) interrupt cost, (b) request size, and (c) protocol message size. We examine the impact of techniques to alleviate these factors and find that our techniques combined can improve throughput by up to 100 % over a simpler unoptimized configuration. Our current prototype is able to achieve a throughput of about 200 MBytes/s over a network that is capable of delivering about 500 MBytes/s. We identify major limiting factors, mostly at the I/O target-side.
Federated DAFS: Scalable Cluster-Based Direct Access File Servers
- In Proc. 2nd Workshop on Novel Uses of System Area Networks (SAN-2
, 2003
"... Protocols like the Direct Access File System (DAFS) leverage user-level memory-mapped communication to enable low overhead access to network-attached storage for applications. DAFS offers significant improvement in application performance using features like direct data transfer and RDMA. Our goal ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Protocols like the Direct Access File System (DAFS) leverage user-level memory-mapped communication to enable low overhead access to network-attached storage for applications. DAFS offers significant improvement in application performance using features like direct data transfer and RDMA. Our goal is to build high performance network file servers using DAFS. The benefits of the DAFS protocol can be extended to cluster-based servers, using low overhead user-level communication within the cluster.
Benefits of I/O Acceleration Technology (I/OAT) in Clusters ∗
"... Packet processing in the TCP/IP stack at multi-Gigabit data rates occupies a significant portion of the system overhead. Though there are several techniques to reduce the packet processing overhead on the sender-side, the receiver-side continues to remain as a bottleneck. I/O Acceleration Technology ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Packet processing in the TCP/IP stack at multi-Gigabit data rates occupies a significant portion of the system overhead. Though there are several techniques to reduce the packet processing overhead on the sender-side, the receiver-side continues to remain as a bottleneck. I/O Acceleration Technology (I/OAT), developed by Intel, is a set of features particularly designed to reduce the receiver-side packet processing overhead. This paper studies the benefits of the I/OAT technology by extensive evaluations through micro-benchmarks as well as evaluations on two different application domains: (1) A multitier data-center environment and (2) A Parallel Virtual File System (PVFS). Our micro-benchmark evaluations show that I/OAT results in 38 % lower overall CPU utilization in comparison with traditional communication. Due to this reduced CPU utilization, I/OAT delivers better performance and increased network bandwidth. Our experimental results with data-centers and file systems reveal that I/OAT can improve the total number of transactions processed by 14 % and throughput by 12%, respectively. In addition, I/OAT can sustain a large number of concurrent threads (up to a factor of four as compared to non-I/OAT) in data-center environments, thus increasing the scalability of the servers. 1

