Results 1 - 10
of
369
Sizing Router Buffers
, 2004
"... All Internet routers contain buffers to hold packets during times of congestion. Today, the size of the buffers is determined by the dynamics of TCP’s congestion control algorithm. In particular, the goal is to make sure that when a link is congested, it is busy 100 % of the time; which is equivalen ..."
Abstract
-
Cited by 352 (17 self)
- Add to MetaCart
(Show Context)
All Internet routers contain buffers to hold packets during times of congestion. Today, the size of the buffers is determined by the dynamics of TCP’s congestion control algorithm. In particular, the goal is to make sure that when a link is congested, it is busy 100 % of the time; which is equivalent to making sure its buffer never goes empty. A widely used rule-of-thumb states that each link needs a buffer of size B = RT T × C, where RT T is the average round-trip time of a flow passing across the link, and C is the data rate of the link. For example, a 10Gb/s router linecard needs approximately 250ms × 10Gb/s = 2.5Gbits of buffers; and the amount of buffering grows linearly with the line-rate. Such large buffers are challenging for router manufacturers, who must use large, slow, off-chip DRAMs. And queueing delays can be long, have high variance, and may destabilize the congestion control algorithms. In this paper we argue that the rule-of-thumb (B = RT T ×C) is now outdated and incorrect for backbone routers. This is because of the large number of flows (TCP connections) multiplexed together on a single backbone link. Using theory, simulation and experiments on a network of real routers, we show that a link with n flows requires no more than B = (RT T × C) / √ n, for long-lived or short-lived TCP flows. The consequences on router design are enormous: A 2.5Gb/s link carrying 10,000 flows could reduce its buffers by 99 % with negligible difference in throughput; and a 10Gb/s link carrying 50,000 flows requires only 10Mbits of buffering, which can easily be implemented using fast, on-chip SRAM.
The Globus Striped GridFTP Framework and Server
- In SC ’05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing
, 2005
"... The GridFTP extensions to the File Transfer Protocol define a general-purpose mechanism for secure, reliable, high-performance data movement. We report here on the Globus striped GridFTP framework, a set of client and server libraries designed to support the construction of data-intensive tools and ..."
Abstract
-
Cited by 137 (21 self)
- Add to MetaCart
(Show Context)
The GridFTP extensions to the File Transfer Protocol define a general-purpose mechanism for secure, reliable, high-performance data movement. We report here on the Globus striped GridFTP framework, a set of client and server libraries designed to support the construction of data-intensive tools and applications. We describe the design of both this framework and a striped GridFTP server constructed within the framework. We show that this server is faster than other FTP servers in both single-process and striped configurations, achieving, for example, speeds of 27.3 Gbit/s memory-to-memory and 17 Gbit/s disk-to-disk over a 60 millisecond round trip time, 30 Gbit/s network. In another experiment, we show that the server can support 1800 concurrent clients without excessive load. We argue that this combination of performance and modular structure make the Globus GridFTP framework both a good foundation on which to build tools and applications, and a unique testbed for the study of innovative data management techniques and network protocols. 1
CUBIC: A New TCP-Friendly High-Speed TCP Variant
, 2005
"... This paper presents a new TCP variant, called CUBIC, for high-speed network environments. CUBIC is an enhanced version of BIC: it simplifies the BIC window control and improves its TCP-friendliness and RTT-fairness. The window growth function of CUBIC is governed by a cubic function in terms of th ..."
Abstract
-
Cited by 114 (2 self)
- Add to MetaCart
This paper presents a new TCP variant, called CUBIC, for high-speed network environments. CUBIC is an enhanced version of BIC: it simplifies the BIC window control and improves its TCP-friendliness and RTT-fairness. The window growth function of CUBIC is governed by a cubic function in terms of the elapsed time since the last loss event. Our experience indicates that the cubic function provides a good stability and scalability. Furthermore, the real-time nature of the protocol keeps the window growth rate independent of RTT, which keeps the protocol TCP friendly under both short and long RTT paths.
Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication
"... This paper presents a practical solution to a problem facing high-fan-in, high-bandwidth synchronized TCP workloads in datacenter Ethernets—the TCP incast problem. In these networks, receivers can experience a drastic reduction in application throughput when simultaneously requesting data from many ..."
Abstract
-
Cited by 93 (1 self)
- Add to MetaCart
(Show Context)
This paper presents a practical solution to a problem facing high-fan-in, high-bandwidth synchronized TCP workloads in datacenter Ethernets—the TCP incast problem. In these networks, receivers can experience a drastic reduction in application throughput when simultaneously requesting data from many servers using TCP. Inbound data overfills small switch buffers, leading to TCP timeouts lasting hundreds of milliseconds. For many datacenter workloads that have a barrier synchronization requirement (e.g., filesystem reads and parallel data-intensive queries), throughput is reduced by up to 90%. For latency-sensitive applications, TCP timeouts in the datacenter impose delays of hundreds of milliseconds in networks with round-trip-times in microseconds. Our practical solution uses high-resolution timers to enable microsecond-granularity TCP timeouts. We demonstrate that this technique is effective in avoiding TCP incast collapse in simulation and in real-world experiments. We show that eliminating the minimum retransmission timeout bound is safe for all environments, including the wide-area.
Data Center TCP (DCTCP)
"... Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today’s state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impai ..."
Abstract
-
Cited by 74 (6 self)
- Add to MetaCart
(Show Context)
Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today’s state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments that lead to high application latencies, rooted in TCP’s demands on the limited buffer space available in data center switches. For example, bandwidth hungry “background ” flows build up queues at the switches, and thus impact the performance of latency sensitive “foreground ” traffic. To address these problems, we propose DCTCP, a TCP-like protocol for data center networks. DCTCP leverages Explicit Congestion Notification (ECN) in the network to provide multi-bit feedback to the end hosts. We evaluate DCTCP at 1 and 10Gbps speeds using commodity, shallow buffered switches. We find DCTCP delivers the same or better throughput than TCP, while using 90% less buffer space. Unlike TCP, DCTCP also provides high burst tolerance and low latency for short flows. In handling workloads derived from operational measurements, we found DCTCP enables the applications to handle 10X the current background traffic, without impacting foreground traffic. Further, a 10X increase in foreground traffic does not cause any timeouts, thus largely eliminating incast problems.
Layering as optimization decomposition
- PROCEEDINGS OF THE IEEE
, 2007
"... Network protocols in layered architectures have historically been obtained on an ad hoc basis, and many of the recent cross-layer designs are conducted through piecemeal approaches. They may instead be holistically analyzed and systematically designed as distributed solutions to some global optimiza ..."
Abstract
-
Cited by 63 (23 self)
- Add to MetaCart
(Show Context)
Network protocols in layered architectures have historically been obtained on an ad hoc basis, and many of the recent cross-layer designs are conducted through piecemeal approaches. They may instead be holistically analyzed and systematically designed as distributed solutions to some global optimization problems. This paper presents a survey of the recent efforts towards a systematic understanding of “layering ” as “optimization decomposition”, where the overall communication network is modeled by a generalized Network Utility Maximization (NUM) problem, each layer corresponds to a decomposed subproblem, and the interfaces among layers are quantified as functions of the optimization variables coordinating the subproblems. There can be many alternative decompositions, each leading to a different layering architecture. This paper summarizes the current status of horizontal decomposition into distributed computation and vertical decomposition into functional modules such as congestion control, routing, scheduling, random access, power control, and channel coding. Key messages and methods arising from many recent work are listed, and open issues discussed. Through case studies, it is illustrated how “Layering as Optimization Decomposition” provides a common language to think
Understanding XCP: Equilibrium and fairness
- in Proc. IEEE INFOCOM, 2005
"... Abstract—We prove that the XCP equilibrium solves a constrained max-min fairness problem by identifying it with the unique solution of a hierarchy of optimization problems, namely those solved by max-min fair allocation, but solved by XCP under an additional constraint. This constraint is due to the ..."
Abstract
-
Cited by 56 (4 self)
- Add to MetaCart
(Show Context)
Abstract—We prove that the XCP equilibrium solves a constrained max-min fairness problem by identifying it with the unique solution of a hierarchy of optimization problems, namely those solved by max-min fair allocation, but solved by XCP under an additional constraint. This constraint is due to the “bandwidth shuffling ” necessary to obtain fairness. We describe an algorithm to compute this equilibrium and derive a lower and upper bound on link utilization. While XCP reduces to max-min allocation at a single link, its behavior in a network can be very different. We illustrate that the additional constraint can cause flows to receive an arbitrarily small fraction of their max-min fair allocations. We confirm these results using ns2 simulations. Index Terms—Congestion control, max-min, optimization.
Evaluation of advanced TCP stacks on fast long-distance networks
- in: Proceedings of the UltraNet workshop
, 2003
"... With the growing needs of data intensive science, such as High Energy Physics, and the need to share data between multiple remote computer and data centers worldwide, the necessity for high network performance to replicate large volumes (TBytes) of data between remote sites in Europe, Japan and the ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
(Show Context)
With the growing needs of data intensive science, such as High Energy Physics, and the need to share data between multiple remote computer and data centers worldwide, the necessity for high network performance to replicate large volumes (TBytes) of data between remote sites in Europe, Japan and the U.S. is imperative. Currently, most production bulk-data replication on the network utilizes mul-tiple parallel standard (Reno based) TCP streams. Optimizing the window sizes and number of paral-lel stream is time consuming, complex, and varies (in some cases hour by hour) depending on net-work congurations and loads. We therefore evalu-ated new advanced TCP stacks that do not require multiple parallel streams while giving good perfor-mances on high speed long-distance network paths. In this paper, we report measurements made on real production networks with various TCP implemen-tations on paths with dierent Round Trip Times (RTT) using both optimal and sub-optimal window sizes. We compared the New Reno TCP with the
Performance isolation and fairness for multi-tenant cloud storage
- In OSDI
, 2012
"... Shared storage services enjoy wide adoption in commercial clouds. But most systems today provide weak performance isolation and fairness between tenants, if at all. Misbehaving or high-demand tenants can overload the shared service and disrupt other well-behaved tenants, leading to unpredictable per ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
(Show Context)
Shared storage services enjoy wide adoption in commercial clouds. But most systems today provide weak performance isolation and fairness between tenants, if at all. Misbehaving or high-demand tenants can overload the shared service and disrupt other well-behaved tenants, leading to unpredictable performance and violating SLAs. This paper presents Pisces, a system for achieving datacenter-wide per-tenant performance isolation and fairness in shared key-value storage. Today’s approaches for multi-tenant resource allocation are based either on per-VM allocations or hard rate limits that assume uniform workloads to achieve high utilization. Pisces achieves per-tenant weighted fair shares (or minimal rates) of the aggregate resources of the shared service, even when different tenants ’ partitions are co-located and when demand for different partitions is skewed, time-varying, or bottlenecked by different server resources. Pisces does so by decomposing the fair sharing problem into a combination of four complementary mechanisms—partition placement, weight allocation, replica selection, and weighted fair queuing—that operate on different time-scales and combine to provide system-wide max-min fairness. An evaluation of our Pisces storage prototype achieves nearly ideal (0.99 Min-Max Ratio) weighted fair sharing, strong performance isolation, and robustness to skew and shifts in tenant demand. These properties are achieved with minimal overhead (<3%), even when running at high utilization (more than 400,000 requests/second/server for 10B requests). 1.
The SILO Architecture for Services Integration, controL, and Optimization for the Future Internet
- In: IEEE International Conference on Communications, ICC apos
, 2007
"... Abstract — We propose a new internetworking architecture that represents a departure from current philosophy and practice, as a contribution to the ongoing debate regarding the future Internet. Building upon our experience with the design and prototyping of the Just-in-Time protocol suite, we outlin ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
(Show Context)
Abstract — We propose a new internetworking architecture that represents a departure from current philosophy and practice, as a contribution to the ongoing debate regarding the future Internet. Building upon our experience with the design and prototyping of the Just-in-Time protocol suite, we outline a framework consisting of (1) building blocks of fine-grain func-tionality, (2) explicit support for combining elemental blocks to accomplish highly configurable complex communication tasks, and (3) control elements to facilitate (what is currently referred to as) cross-layer interactions. In this position paper, we take a holistic view of network design, allowing applications to work synergistically with the network architecture and physical layers to select the most appropriate functional blocks and tune their behavior so as to meet the application’s needs within resource availability constraints. The proposed architecture is flexible and extensible so as to foster innovation and accommodate change, it supports a unified Internet, it allows for the integration of security and management features at any point in (what is now referred to as) the networking stack, and it is positioned to take advantage of hardware-based performance-enhancing techniques. I.