Results 1 - 10
of
140
A Scalable, Commodity Data Center Network Architecture
, 2008
"... Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfo ..."
Abstract
-
Cited by 91 (9 self)
- Add to MetaCart
Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50 % of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.
Load Balanced Birkhoff-von Neumann Switches, Part II: Multi-stage Buffering
, 2001
"... The main objective of this sequel is to solve the out-of-sequence problem that occurs in the load balanced Birkhoff-von Neumann switch with one-stage buffering. We do this by adding a load-balancing buffer in front of the first stage and a resequencing-and-output buffer after the second stage. Moreo ..."
Abstract
-
Cited by 89 (12 self)
- Add to MetaCart
The main objective of this sequel is to solve the out-of-sequence problem that occurs in the load balanced Birkhoff-von Neumann switch with one-stage buffering. We do this by adding a load-balancing buffer in front of the first stage and a resequencing-and-output buffer after the second stage. Moreover, packets are distributed at the first stage according to their flows, instead of their arrival times in Part I. In this paper, we consider multicasting ows with two types of scheduling policies: the First Come First Served (FCFS) policy and the Earliest Deadline First (EDF) policy. The FCFS policy requires a jitter control mechanism in front of the second stage to ensure proper ordering of the traffic entering the second stage. For the EDF scheme, there is no need for jitter control. It uses the departure times of the corresponding FCFS output-buffered switch as deadlines and schedules packets according to their deadlines. For both policies, we show that the end-to-end delay through our multistage switch is bounded above by the sum of the delay from the corresponding FCFS output-buffered switch and a constant that only depends on the size of the switch and the number of multicasting flows supported by the switch.
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract
-
Cited by 64 (11 self)
- Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compile-time heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamic-level scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graph-analysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
Making Parallel Packet Switches Practical
- IN IEEE INFOCOM
, 2001
"... A parallel packet switch (PPS) is a switch in which the memories run slower than the line rate. Arriving packets are spread (or load-balanced) packet-by-packet over multiple slower-speed packet switches. It is already known that with a speedup of , a PPS can theoretically mimic a FCFS output-queued ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
A parallel packet switch (PPS) is a switch in which the memories run slower than the line rate. Arriving packets are spread (or load-balanced) packet-by-packet over multiple slower-speed packet switches. It is already known that with a speedup of , a PPS can theoretically mimic a FCFS output-queued (OQ) switch. However, the theory relies on a centralized packet scheduling algorithm that is essentially impractical because of high communication complexity. In this paper, we attempt to make a high performance PPS practical by introducing two results. First, we show that small co-ordination buffers can eliminate the need for a centralized packet scheduling algorithm, allowing a full distributed implementation with low computational and communication complexity. Second, we show that without speedup, the resulting PPS can mimic an FCFS OQ switch within a delay bound.
Designing Least Cost Nonblocking Broadband Networks
, 1997
"... Integrated network technologies, such as ATM, support multimedia applications with vastly different bandwidth needs, connection request rates, and holding patterns. Due to their high level of flexibility and communication rates approaching several gigabits per second, the classical network planning ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
Integrated network technologies, such as ATM, support multimedia applications with vastly different bandwidth needs, connection request rates, and holding patterns. Due to their high level of flexibility and communication rates approaching several gigabits per second, the classical network planning techniques, which rely heavily on statistical analysis, are less relevant to this new generation of networks. In this paper, we propose a new model for broadband networks and investigate the question of their optimal topology from a worst-case performance point of view. Our model is more flexible and realistic than others in the literature, and our worst-case bounds are among the first in this area. Our results include a proof of intractability for some simple versions of the network design problem, and efficient approximation algorithms for designing nonblocking networks of provably small cost. More specifically, assuming some mild global traffic constraints, we show that a minimum-cost non...
Analysis of the Parallel Packet Switch Architecture
- IEEE/ACM TRANSACTIONS ON NETWORKING
, 2003
"... Our work is motivated by the desire to design packet switches with large aggregate capacity and fast line rates. In this paper, we consider building a packet switch from multiple lower speed packet switches operating independently and in parallel. In particular, we consider a (perhaps obvious) paral ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Our work is motivated by the desire to design packet switches with large aggregate capacity and fast line rates. In this paper, we consider building a packet switch from multiple lower speed packet switches operating independently and in parallel. In particular, we consider a (perhaps obvious) parallel packet switch (PPS) architecture in which arriving traffic is demultiplexed over identical lower speed packet switches, switched to the correct output port, then recombined (multiplexed) before departing from the system. Essentially, the packet switch performs packet-by-packet load balancing, or inverse multiplexing, over multiple independent packet switches. Each lower speed packet switch operates at a fraction of the line rate . For example, each packet switch can operate at rate . It is a goal of our work that all memory buffers in the PPS run slower than the line rate. Ideally, a PPS would share the benefits of an output-queued switch, i.e., the delay of individual packets could be precisely controlled, allowing the provision of guaranteed qualities of service. In this
Nonblocking Multirate Networks
- SIAM JOURNAL ON COMPUTING
, 1989
"... An extension of the classical theory of connection networks is defined and studied. This extension models systems in which multiple connections of differing data rates share the links within a network. We determine conditions under which the Clos and Cantor networks are strictly nonblocking for mult ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
An extension of the classical theory of connection networks is defined and studied. This extension models systems in which multiple connections of differing data rates share the links within a network. We determine conditions under which the Clos and Cantor networks are strictly nonblocking for multirate traffic. We also determine conditions under which the Benes network and variants of the Cantor and Clos networks are rearrangeable. We find that strictly nonblocking operation can be obtained for multirate traffic with essentially the same complexity as in the classical context.
Scalable Interactive Volume Rendering Using Off-the-Shelf Components
- In Proc. IEEE Symp. Parallel Large-Data Vis. Graphics (PVG) (2001
, 2001
"... This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volumetric visualization of large rectilinear scalar fields. By employing pipelined associative blending operators in a sort-last configuration a demonstration system with 8 r ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volumetric visualization of large rectilinear scalar fields. By employing pipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow mapping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators. CR Categories: C.2.4 [Computer Systems Organization]: Computer-Communication Networks---Distributed Systems; C.2.5 [Computer Systems Organization]: Computer-Communication Networks---Local and Wide Area Networks; C.5.1 [Computer System Implementation]: Large and Medium ("Mainframe") Computers---Super Computers; D.1.3 [Software]: Programming Techniques---Concurrent Programming; I.3.1 [Computing Methodologies ]: Computer Graphics---Hardware Architecture; I.3.2 [Computing Methodologies]: Computer Graphics---Graphics Systems; I.3.3 [Computing Methodologies]: Computer Graphics--- Picture/Image Generation; I.3.7 [Computing Methodologies]: Computer Graphics---Three-Dimensional Graphics and Realism Keywords: sort-last, parallel, cluster, shear-warp, volume rendering, ray-c...
COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies. NSDI
, 2010
"... Operators of data centers want a scalable network fabric that supports high bisection bandwidth and host mobility, but which costs very little to purchase and administer. Ethernet almost solves the problem – it is cheap and supports high link bandwidths – but traditional Ethernet does not scale, bec ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Operators of data centers want a scalable network fabric that supports high bisection bandwidth and host mobility, but which costs very little to purchase and administer. Ethernet almost solves the problem – it is cheap and supports high link bandwidths – but traditional Ethernet does not scale, because its spanning-tree topology forces traffic onto a single tree. Many researchers have described “scalable Ethernet ” designs to solve the scaling problem, by enabling the use of multiple paths through the network. However, most such designs require specific wiring topologies, which can create deployment problems, or changes to the network switches, which could obviate the commodity pricing of these parts. In this paper, we describe SPAIN (“Smart Path Assignment In Networks”). SPAIN provides multipath forwarding using inexpensive, commodity off-the-shelf (COTS) Ethernet switches, over arbitrary topologies. SPAIN precomputes a set of paths that exploit the redundancy in a given network topology, then merges these paths into a set of trees; each tree is mapped as a separate VLAN onto the physical Ethernet. SPAIN requires only minor end-host software modifications, including a simple algorithm that chooses between pre-installed paths to efficiently spread load over the network. We demonstrate SPAIN’s ability to improve bisection bandwidth over both simulated and experimental data-center networks. 1

